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JAMES BERNOULLI’S THEOREM. 
By KARL PEARSON. 


(1) In the course of lectures on the history of statistics given during the last 
three or four years, I have felt it needful to consult original papers and have been 
struck by the manner in which the history of science gets created and attributions 
accepted. Instances of this may be found in such widespread notions as that 
Leibnitz was the first to use differentials, that Gauss was the discoverer of the 
normal curve of errors, that Lagrange invented Lagrange’s formula, that Bessel 
originated Bessel’s method of interpolation, that Bravais introduced the coefficient 
of correlation, or that James Bernoulli demonstrated James Bernoulli’s Theorem. 
It is with the latter point that I wish to concern myself in the present paper. 
In all the French and German works on probability with which I am acquainted, 
there is a chapter devoted to “James Bernoulli’s Theorem.” The statement of this 
theorem is usually summed up in some such phrase as this: Accuracy increases 
with the square root of the number of observations. And the fact that the 
constants of frequency distributions in the case of large samples have standard 
deviations varying inversely as the square root of the size of the sample is 
repeatedly spoken of as an illustration of Bernoulli’s Principle, or as part of the 
generalised theorem of Bernoulli. The proofs provided of Bernoulli’s Theorem in 
the text-books referred to can be traced with variations—for the majority of text- 
book writers copy their predecessors as well in mathematics as in anatomy—back 
to Laplace, and through him to De Moivre’s second supplement to the Miscellanea 
Analytica, 1733. 


Usually there is a more or less vague reference to the Ars Conjectandi of 1713. 
It is not, however, my purpose to trace here the tortuous process by which 
De Moivre’s Theorem has in the course of historical evolution become associated 
with Bernoulli. I want rather to discuss what Bernoulli really did achieve, to 
investigate whether his method of approaching the subject can be profitably 
pushed further, and to indicate that if the text-book writers had really examined 
the Ars Conjectandi they would never have attributed to Bernoulli the law that 
accuracy varies with the square root of the number of observations. There is no such 
law and no near approach to such a law to be found in the fourth and fifth chapters 
of the Pars Quarta of Bernoulli’s Ars Conjectandi. What Bernoulli endeavours to 
do is to prove that by increasing sufficiently the number of observations he can 
cause the probability—i.e. that the ratio of observed successful to unsuccessful 
occurrences will differ from the true ratio within certain small limits—to diverge 
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from certainty by any assignable limit. Now the problem is clearly one which 
depends on summing any number of terms of a binomial, and therefore will be 
solved with complete accuracy—at any rate for the range of the arguments of the 
table—when the Tables of the Incomplete B-Function now under construction are 
published*. Approximations to the sum of terms of a binomial may be made in 
various ways, for example by the normal curve as was done by De Moivre for this 
very purpose. But James Bernoulli did not proceed in this manner, he adopted a 
very crude method of inequalities which I will shortly reproduce. Let us suppose 
that N is the true number of observations which must be made to give the 
required value of the probability, then clearly if m be >1, mN will give a prob- 
ability falling within the required limits. But the practical value of the solution 
must depend on m being nearly unity. This is far from the case with Bernoulli’s 
solution. He gets most exaggerated values for the needful number of obser- 
vations, and for this reason his solution must be said to be from the practical 
standpoint a failure; it would ruin either an insurance society or its clients, if 
it were adopted. All Bernoulli achieved was to show that by increasing the 
number of observations the results would undoubtedly fall within certain limits, 
but he failed entirely to determine what the adequate number of observations were 
for such limits. That was entirely De Moivre’s discovery. 


(2) Bernoulli's Attempt at Solution. I propose to reproduce this in its main 
features, without following Bernoulli verbatim. 


Bernoulli considers the binomial (7 +s)”, where t=r+<s. The general term 
of this is 
| nt 


nt— p | p 
and the greatest term is that for which p = ns, i.e. 


int 


—_ yr gh 


|nr |ns 


nl—p ep 
er? eg, 


In the binomial the powers of 7 will be those in the first, the corresponding 
powers of s those in the second line below: 

ar+ns, nr+ns—1,...nr+n,... nr,... ur—n,... 3, 2, x 0, 

0, ae 2 eee ee ee eee ns +nr—1, ns+nr. 

From the maximum term to nr — » we have n terms and from the term nr + n 
to the maximum term we have another n terms; beyond the (nr —n)th term we 
have (r— 1) » terms and before the (n+ )th, (s — 1)” terms. Bernoulli proposes 
te consider the ratio of the sum of the n terms from nx —1 to nr —n inclusive to 
the sum of the n(r—1) beyond, and the sum of the n terms from nr +n to nr+1 
to the sum of the n(s—1) terms which precede the (nr +n)th term; i.e. omitting 
the nrth term for the moment, he is considering the sum of the 2n middle terms 
in relation to the sum of the remaining tails. Bernoulli notes that on either side 


* See Biometrika, Vol. xv1. p. 68. 
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of the greatest term the terms continually decrease, and that their rate of decrease 
becomes more and more rapid. We shall follow Bernoulli’s process a little more 


| clearly if we write the term containing the pth power of r as u,. Now let us 
t suppose c a quantity such that : 
| eee Unr+ a Unrte = b eee + Unrtn (i) 
Unrtnti + Unringe + ++» + Unrgnan + 2 (8 — 2) other terms 
Next let Unr|Unrrn = az. 


By what precedes 





Se ig a Sg SY ad ws en. 
= : Unr+o2 Unr+n+e Unr+s  Unrtnis 
Therefore : 
Uny. Unyso Uny+-2 Unes. 
wet —_—, and so on; 
Unrtn+1 Unr+-n+2 Unr+n+e Unr+n+3 


: ie c 
while —“ = ~& is less than all these. 
Unr+n 
j Thus : Cersar < Mares &¥areass < Marys Bd 80 ON. 
Accordingly : 
< Unrtai T + Unr+e » HF vee F Unrtn ae (ii) 
Unrgenti + Unrente +--+ + Unrgntn 
> e © 7 . ~ 
But: Unrnti + Unpente + + + Unrinen +n (8 — 2) other terms 
‘ 1S < (s cy 1) (Unringa ~ Unr+n +2 Si we + Unr+n +n)» 
and . a nrg + nrg F o2s tren 
s-1 (s— 1) (Unr+n4 y+ Unrgnte +--+ + Unr+ntn) 
e Unr+i + Unr qot.- - + Unrtn (iii) 
Unrt+n+i = Unr+n+2 Stes Pasenge’ +n (s =e 2) other ter ms 
‘ Now this ratio is greater than c. If then we take ¢ —— it will follow that 


= 
— Marit + Unr+2 a Unrtn 


arin + Unrtenge + +++ + Unvgngn + 2 (Ss — 2) other terms 





will be >e. 
We can deal similarly with the terms beyond the maximum term, and if we 
choose the same value of c we shall have: 


“> c(r — 1). 
Accordingly we have : 
* Unrsa + Unrge $ 0+ + Unrin > C (Unrenga $+ +++ + Unrgngn +0 (8 — 2) other terms), 
i Unrait Unr—ot ++. + Unr—n > C (Unr—n—a +--+ + Unr—n—n +2 (7 — 2) other terms), 
and of course : 
Unrar + Unrge t+ -e- + [Une ]* + tara + +» + Uren 
>c {sum of both remaining tails}............(iv). 


* Bernoulli does not refer to the maximum term w,,. but it clearly only increases the inequality. 
14—2 
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The question next to be answered is what value shall we give to «? 
N . — _ |nt Nr ols nt NTN ons—N 
Ow: “>= Unr/Unrgn = Jur [ns rs | oreters 7 s§ 
_ (nr+n)(nr+n—1)... (nr +1) 8” 
~  ns(ns—1)...(ns—n+1) 
(nrs + ns) (nrs + ns — 8) ... (nrs + 8) 
fn (nrs — nr +1) (nrs — nr + 2r)... mrs’ 

if we write the denominatorial factors in reverse order. Thus we have, dividing 
the factors by n: 








s 2s s 
rs+s—— T8+8— — Ts + — 
TS+8 n n n (v) 
t= . Sg: 7c dic ucameene situate . 
r 2r 2r rs 
rs —’r +— rs—s8+— rs—-8+— 
n n n 





Now all these factor-ratios are greater than unity and the first is the greatest. 
Their values all lie between 











s 

rs -+ we 

Ts+s n 

and 
, Ts 
rs—r+-— 
n 
m , : Ts+s . 
Bernoulli assumes that the mth factor-ratio will be ae and determines m 
from 
TS+8 a3 
n rs+s r+] 
= = a geqigucdismaeonvend (v1) 
r rs 7 

rs —r+— 


He notes that all the factor-ratios above the mth are greater than the mth and 
all the following factor-ratios are greater than unity. Thus he finds that 


(" + ‘y" 

x > . 

> 

Accordingly if we choose n so that 
o(s—1)<(“**) 


e(s—1) will certainly be less than # which is our fundamental condition. But it 
follows from (vi) and (vii) that 


8 
n=m(1+ =) - ; 








r+] r+] 
log ¢(s— 1) 


m> 


* log (r +1) — log r’ 
Similar reasoning applies to the other side of the maximum term. It accord- 
ingly follows that if n be given the larger of the two values below, i.e. 


log ¢ (s — 1) 8 s 
Pe om rt tees rs 
_loge(r— 1) ( r ) Yr 
log(s + 1)—logs \ 


s+] ~ eee 
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then we can make the frequency of the central series of terms larger than ¢ times 
the frequency of the sum of the tails. But the central series of terms measures 
the frequency from the (nr —n)th term to the (nr + )th term inclusive. Bernoulli 
now applies this to the theory of probabilities. Let p,=r/(r+s) be the true 
probability of success, then in nt trials the probability of a success occurring 
between nr—n and nr+n times, or an average of successes between (nr —n)/nt 


and (nr + n)/nt, ie. between p, + 4 will be measured by the ratio of the sum of the 


. 1 
central series of terms to the sum of all the terms. If now we take 7 any small 


value we please and take n the larger of the two values in (vill), then we are 
certain that in né trials the ratio of successes to total trials will have a probability 
c 


ea or the odds are c to 1 that the ratio of successes to trials will 


greater than 


fall within the limits p, + =. 


Bernoulli then turns the problem round and says that if the observed value in 
‘ a _ : 
nt trials be p, then the true value p, will lie between p + 3 with the given prob- 


ability. This is rather stated than proved, but it is of course the kernel of much 
later developments of importance. Leibnitz raised objections to it*. 

Now let us look at the modern method of dealing with this problem. Suppose 
p the probability of an event happening, q of its failing, then if the trial be made 
m times the frequency will be distributed according to the terms of the binomial 
(p+q)". The most probable result is mp, and the mean number of successes p. 
If we want to consider a result which does not differ more than + e from p we may 
obtain its probability by summing the terms of the binomial from mp— me to 
mp +me, and it can be demonstrated that by taking m sufficiently large this 
probability can be made as near unity as we please. Provided neither p nor q be 
indefinitely small, it was first shown by De Moivre and the proof has often been 
reproduced since, that the appropriate areas of the curve: 


y= a 5 
N2Qar Vmpq 
give very approximately the sums of corresponding terms of the binomial. Thus 
the probability P. of the observed result lying between p + e is 


1 +me _}4 we 
fF, = —— | e ~mpq dx, 
V Qa V mpg J - me 
Ne 
or: P= Jan NE  iiisivcieteasveee (x). 


Accordingly if we indefinitely increase m the upper limit approaches infinity 


and the value of the integral } V2 or P, approaches unity. Or again if we need 
* See our p. 206. 
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P, to have a definite value for a given e, we can find a value of m which will 
insure it. For a given value of P, we must have Vme/pq a constant, or the limit 
of ¢ varies inversely as Vm, i.e. accuracy varies directly as the square root of the 
number of observations. Such is the so-called Bernoulli's Theorem given by the 
text-book writers from Cournot in 1840 to the present day. The discussion is 
essentially that of De Moivre though his name is not mentioned. The French 
and German writers have consciously or unconsciously taken De Moivre’s results 
and his very method and attributed them to Bernoulli so that practically the 
whole scientific world believes Bernoulli provided what is alone due to De Moivre. 
Bernoulli knew nothing of the approach of the normal curve to the high power 
binomial, he knew nothing of the inverse square root law of accuracy; he obtained 


only the crudest approximations to the number of observations needful to get a 
certain value of the odds. 


Bernoulli himself gives an example, namely where 7:s::3:2. He takes 
r = 30, s=20, t= 50 and determines the number of trials which will certainly 
give odds of c: 1 that the observed result will lie between $+°02. I place here 
his results against those obtained by De Moivre’s Theory : 








, = = 
Odds ¢ : 1 | Bernoulli’s Result De Moivre’s Result 
= | uy | a 
1000 : 25,550 trials | 6,498 trials 
10,000 : L | 31,258, | 9,082 ,, 
100, 000 : | 36,966 ,, | Ewe . 
| 


These numbers, I think, show that while Bernoulli had a vision of “Bernoulli’s 
Problem,” he failed to solve it in anything like a reasonable manner. His dis- 
cussion is not only lengthy and somewhat obscure, but is too loose to give any 
valid solution. He appreciated, what most experimenters knew already, that one 
gets nearer to the truth by increasing the size of the sample, but he really failed 
to obtain any adequate measure of the approach. Yet Bernoulli himself thought 
his results of “high utility.” The method by which he reached his problem is of 
much interest. After giving a number of definitions of probability, luck, certitude, 
moral certitude, etc., Bernoulli states that what he aims at in his theorem is to 
reach by repeated trials a “moral” certitude of 99 in 100 or 989 in 1000; he 
wants to show us that repetition will give us a moral certitude of as great an 
intensity as we please. He fully recognises indeed the importance of the problem 
under consideration ; thus he writes: 


This is therefore the problem that I now wish to publish here, having considered it closely 
for a period of twenty years, and it is a problem of which the novelty as well as the high utility 
together with its grave difficulty exceed in value all the remaining chapters of my doctrine. 
Before I treat of this “‘ Golden Theorem” I will show that a few objections, which certain learned 
men* have raised against my propositions, are not valid (Avs Conjectandi, 1713, p. 327). 


* See p. 205 above. 
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Bernoulli admits that as a matter of experience even in his day, the increase 
of accuracy by the increase of trials was recognised. But he was probably the 
first to more or less clearly state the problem. It must be admitted, I think, that 
his twenty years consideration of the matter did not lead him to a solution of 
much value. 


(3) It may not be without interest to deal a little fully with what seem to be 
the weak points in Bernoulli’s method of inequalities. They are essentially two- 


fold: 


(a) His method of reaching a value of # = Up,/tn,4, and therefore of n is far 
too loose. 


(b) His approximation to the n (s—2) terms in the denominator of (iii) is so 
appallingly crude that he must get bad results. The idea that the tail may be 
divided up into (s—1) series of n terms, and that the last (s — 2) series may have 
substituted for their proper values that of the first series, leads of course to an 
inequality, but one that gives a very poor approximation. 


We will consider these points in succession. The actual value of « is given by 


("’ \jnr—n |ns+n 
Z£= — 4 
8 jnr Ins 


Now x being large let us apply Stirling’s Theorem, and we find 


PH) Pez, — rl 7g s+1)n 
o- ~le+l (c=) e *) \ , nearly, 
r 8 r 8 J ; 


Equating this to ¢(s — 1) we find that 














loge(s-—1)—4 ae : —tlog = 
ad —~ 7 iesvttesoereael (xi). 


(r=) log” - +(8+1) log — 


Had we worked with the series on the other side of the maximum term, we 
should have had : 


loge(r—1)—4 ai —— + — slog +3 = 


n> a na ; eicivesemmel (xii). 
(s—1) log = —- +(r+1) log = 





Computing n from these limits I found for the greater 7 the values indicated 











below : 
’ ——, 
: Bernoulli eatted | . 
Odds c:1 | Bernoulli | by better « De Moivre | 
———— - “| S ‘i a _-— — | — —$—$—— 
- 1000:1 | 25,550 trials 12,387 trials | —_ 6,498 trials 
10,000:1 | 31,258, 15,165 ,, | 9,082, 
100,000 : 1 | 36,966 ,, wae. | | Ce 
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We have therefore divided Bernoulli’s number of trials roughly by two, when 
we use a closer approximation than he used to the value of 2 We now ask: is it 
not feasible to obtain a better approximation to the tail of a binomial than 
Bernoulli adopted ? The answer is most certainly and in the simplest of manners. 
Consider the series beyond the term nr — n in 7, i.e. 

7 a 2 er See + Uy 
ot s mr—n 3 (nr—n)(nr—n—1) 
ae C nst+n+1. r (ns+n+1)(ns+n-+2) vi .) 

Now nms+n+7is >ns+n, 

nr—n—tis<nr—n; 





it follows accordingly that 


y P snur—n s\? (nr —n ) 
Sar—n isS< Unr—n }\— ———— Pag cool 
rnstn- \r) \ns+n/ J 


POMBE, td (SS) +(S s+ \ 
wel © atl r stl r stl sa 


r—l1 s 





u nr—n 





8 
r-l s t ‘ 


summing to infinity, which can only increase the right-hand side. 
Applying Stirling’s Theorem to the term 
Int 


|nr - =e |ns id n 


2 1) 
. find : hia yee, /? fe: 
we fine ee = 7B t(e+1) 
r r—} 
where A= (7, i) (= : 4) 


Similarly if S,,_, denote the sum of all the terms with values of the powers of 
s below ns —n, we have: 


2(s ~1}) 
Sns sh <4 j Lent J23 7 2 ; 7 
vis Vn t (r + 1 ) | 


where A, = (— = 5) : (s i)" | 


Now 4” is half the sum of the terms of the total binomial and it is clear that 
we have obtained a superior limit to the ratio of the tails to the body, or if 
we please the sum of the tails to the total binomial. The method is at least 
in one respect superior to De Moivre’s, for the latter makes both tails equal, while 
in the present method we can obtain a limit to either tail independently. 


a — Yaa -—n gnstn 


ete eee” (xiii). 


Renee sideainekiev te 


Now following Bernoulli’s notation : 
c = ratio (4t" — tail) to tail, 


1 Se 
“ = ratio of tail to 4t". 
c+1 . 





| 
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Now take { = 2n log, B,, 6 =2n log, B., then: 
es 7 t(s+1) 
g, — A(e+ 1) log, B; s* (7 — 1) 
_341,0940— t(s +1) os 
“hs ee oe eee (xv) 
and eo __°341,0940  t(r +1) 
& (¢+1) logy B r?(s—1) 
If given values now be chosen for c, 7, s and t and the above equations solved 
for ¢ and €, we obtain values of n, and thus of nt, fixing the number of trials 
needful to obtain given odds ¢ to 1 so that the result will lie within the limits 
r 1 
r+s t t° 
Applying our result to Bernoulli’s first case ¢ = 1000 we find : 
€, = — 10°905,205, giving nt = 6470, 
& = — 10°898,971, giving nt = 6502, 
results of the same order as the 6498 of the normal curve, and approximately 
again halving the result fouhd by improving «. It is therefore clear that of 
Bernoulli’s exaggerated limit, which is four times De Moivre’s value, one half is 
due to his bad determination of # and one quarter to his poor method of obtaining 
a limit to the tail. We can now sum up our results in the following table : 


Number of Trials needful to obtain the odds given in the first column that with an 
actual probability of success of °6, the observed result will lie between *6 + °02. 








Kal ite ince var easenseees (xvi). 

















Otine 2 | Bernoulli’s Bernoulli’s Rgsult Tails as De Moivre’s 
: | Result with better choice of « | Geometrical Series | Value 
| 
ape ia es 
| eee 1 een ee ao 
1000:1 | 25,550 trials 12,387 trials | y+ trials | 6,498 trials 
? ’ | 6,502 J 
: 21.95 5 1@5 | 9,005 | ¢ 2 
10,000 : 1 | 31,258 ,, 15,165, 9050; | 9,082 ,, 
| epare nigas 11,588 | = 
( : 36,$ 94: . > 
100,000 : 1 | 36,966 ,, 17,943, 1647f 11,704 ,, 





The reader will be surprised possibly to notice that the geometrical series 
method gives lower and therefore more accurate results than the normal curve, 
although the differences are not very important. It has also the advantage 
of giving different limits for the two tails, which the normal curve does not 
distinguish owing to its symmetry. 

But there is more than one way of applying the normal curve to binomial 
data. The usual way would be that which gives the numbers in the fifth column 
above. Namely if (p+ q)™" be the binomial, we need the frequency between p +1/t, 


? , : 1 
i.e. for the tail we need the area of the normal curve from 0 to me: But the 


area from — 2 to a/o is N4(1 +a) in the notation of the Tables for Statisticians 
1 


m 
of x 


and Biometricians. In other words $(1 +a) corresponds to the value 





N mpy 











; 
i 
| 
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in the table of the probability integral. But in Bernoulli’s case t= 50, 1/t= 02, 
p='6,q="4 and therefore Vm x ‘02/24 = a, ie. m = 600a*. But Bernoulli’s 
1 _ 2c+1 

2(e+1) 2%+2° 
find (1+) and thence x and so m the power of the binomial. I call this the 
ordinary or usual process, but it overlooks the discrete character of the binomial. 
We suppose our curve areas to represent the discrete values of the terms of the 
binomial. Now Bernoulli is seeking the sum of the terms from nr —n to nr +n, 
including both these. Hence we ought to integrate the curve not from 0 to n but 
to midway between the nth and (n + 1)th term, i.e. up to 2 + 4. 

n+4 n+} 
Vat.3.2 V12n 
be the value of the argument, 2’ say, of the table of the probability integral. 
Accordingly we have for Vn 


c = (} body — tail)/tail or }(1 +a) =1— 





It is accordingly easy to 


In Bernoulli’s notation «/o will be , since t= 50, and this will 


ee, Oe a a Tere (xvii), 

x’ as before being found from the argument of 
$(1 +a) =(2c41)/[(2Ze+ 2)  ..rcrrcsceceeeeeeeees (xvili). 
When Vn has been found from this equation, the power m of the binomial is 

nt = 50n. 


I worked out the values of m for the three cases given by Bernoulli and found 
them to be 


6448, 9032 and 11654. 


These make the normal curve method very close indeed in the number of trials 
to the geometrical series method. The only advantage remaining to the latter is 
the evaluation of the separate tails. It is a somewhat longer process at present, 
but this might be remedied by the publication of a table of e~§/&, which could 
be readily constructed for the suitable range of & values. 

After all, I think, we must conclude that it is somewhat a perversion of histo- 
rical facts to call the method involved in Column (v) of the Table on p. 209 by the 
name of the man who after twenty years of consideration had not got further than 
the crude values of Column (ii) with their 200 to 300 per cent. excesses. Bernoulli 
saw the importance of a certain problem ; so did Ptolemy, but it would be rather 
absurd to call Kepler's or Newton’s solution of planetary motion by Ptolemy’s 
name! Yet an error of like magnitude seems to me made when De Moivre’s 
method is discussed without reference to its author, under the heading of “Bernoulli’s 
Theorem.” The contributions of the Bernoullis to mathematical science are con- 
siderable, but they have been in more than one instance greatly exaggerated. 
The Pars Quarta of the Ars Conjectandi has not the importance which has often 
been attributed to it. 


I have to thank Mr James Henderson for aid in the interpretation of Bernoulli's 
somewhat obscure analysis. 








ON THE MEANS, STANDARD DEVIATIONS, CORRELATIONS, 
AND FREQUENCY DISTRIBUTIONS OF FUNCTIONS OF 
VARIATES. 


By Proressor KAZUTARO YASUKAWA, M.Sc. 


(1) THE scope of this paper may be described in the following manner: Two 
variates 2 and y have their frequency constants known. w is a single-valued 
function of « and w a single-valued function of y. It is proposed to study the 
frequency constants of u and w with their correlations etc., on the assumption 
that the coefficients of variation of # and y are of the order of magnitude that 
occurs in the great bulk of biometric investigations. It is not contended that 
coefficients of variation cannot be found—such as those for weight of human 
adults or for mortality rates for heterogeneous population-masses—which are 
unsuited to the expansions herein discussed. But in most anthropometric inquiries 
coefficients of variation range from 2°0 to 8°0, and accordingly the ratio of standard 
deviation to mean is from about ‘02 to 08, a reasonable average value being ‘04. 
It is therefore justifiable to expand our functions in series of powers of the 
coefficients of variation. Expansions to the third or fourth power are generally 
adequate for statistical purposes, and in some cases it is enough to proceed to the 
second. 





(2) The notation we shall adopt will be of the usual type. For variate «: 


wv, Ty, Ve = o,/z, af = afte’ | aps', fs = obts| apts”, 
and for variate y: 

Y, Fy, Yy=Fy/Y,  yBr = ylts?/ ype’, yRr= yPal yb", 
while 7y,, “uw Will represent correlations; u, # the means of u and w respectively 
functions of # and y, and c,, o, their standard deviations. When a variate is 
dashed, it signifies that it is measured from its mean. 


Now u = ¢, (#) may be written 


, 


u+u=o (+ a)=f, (1 + ) , 


and it is this function f,, or comparable functions, that we shall use throughout. 


Expanding f, (1 + 2’/) by Taylor's theorem we obtain 


atu’ =f,(1)+ . fi A)+4 (ZV 4" (1) +} (Z) 4 (1) 


x 


, 


+h (=) F" A) + ete. ......... (1). 
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Summing for all values of # and dividing by their number, we have: 
@=fi(l) + defi" (LI) +5 VaR v2 fi” (1) + oy Barat fil” (1) +... (2). 


Tkis series which expresses the mean value of a function of # in terms of the 
frequency constants of x will converge, owing not only to the customary smallness 
of v,, but also to the usual convergency of the terms of a Taylor’s series. 


Subtracting (2) from (1) we find : 
at 9 “4s wy lar pers (2 wey me 
u' = a (1)+3fA ay4() wh Ah (1) (3) VeBiveh 
+f" (1) (5) - Bare eis stl 


or, introducing v, and o, and rearranging : 
c mnt [LAHEY ase (CY 
+ eal aye oe ae ee 


Squaring (4) and taking mean values we reach after some troublesome but 
straightforward algebra : 








fe") 
fra) - 


Ji (1) 


oY = v," (fi (A)? E + Uz" oP, + 402" 


(fy 1 fi") 
+ 4Vz 47a) (eB. — 1)+ fs Va! r Fi) 
Ge 
+ }0,' (f(y («Bs Vif) +] So aenaemmianea’ (5), 
where «8s = = abs 5/F,° = aBs|V eB. 


Taking the square root we have : 





er 1» Jv Q) fi”) 
Oy=t Ur fi (1) t + me (1) VB, + £ Uz" Ff ()) Be 





Fr) "F i) 
+ ee? ae By - BN B,—VeB) 
sia) ne (4 a) (x82 — 2S, —1) VB, + | Serre ee (6). 


Of the above double sign we must choose the one which always makes o, 
positive, i.e. the same sign as that of f/ (1). 


This is the value up to terms of the fourth order in v, of the standard 
deviation of the function of a variate. 











ay (4) by (6): 


f(A) 
| ul Be Thled 411) 


(yao 
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Ji) 
FAT 


(Cv 


+94 02° 


(1) 
fi’) 


r ‘mn 
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fi. (Q) 
_ Fi" (1) 
ae FTL) 


h1 +4 yo, VB +3 


as + fy 02° 


bon? Fay ob +402 (Ba 


fi A) A” 


fi) 9 


fi’) 
fi Q) 


) (eB. - 


— Be VB, al 


V 283) 


~1)] 


Form of u 


smb aJle; 


u=klog x; ° 





(A MY 
fi’) - 
oe i6Y oa ai) ¢ (a8: — 2B: — 1) V 28 + «.. x 








, 


* and take the mean of this product : 
Cy 


ore 1)\\? fh’ 1 QA” (1) — 
roam | 1m pee (BAY (Beam tyne I (es — oN B— Veh) 
1 (Q)\, 
+a (SF Vs 2 — 28; —1)+... 
fe (1) B, (8 mel J 
This is the correlation coefficient between a variate and any function of this 
variate, and indicates how far the coefficient diverges numerically from unity. 
The true association between the variates would be measured by the correlation 
ratio, %u.¢=1. The terms in »,? etc. represent how far the association as measured 
by the coefficient of correlation falls short of the correlation ratio. 


Multiply both sides of (7) by 





If the variate « follows the normal law, 

2B; = 2B; = 

a” 

Tu, = tT 1 — }0v; (4 (1) 
fi (1) 
Table I provides a series of values of 7y, for various functions u of #, when we 
The reader may convince himself from this table how 
closely the values of the correlation coefficient approach unity—for all practical 


TABLE I. 


oP. =3 


and ) + terms in v,* and higher powers] »--(9). 


give different values to v,. 

















Approximate Values of yx for various Functions of « and various Values of vz. 
Values of vp. 
l —= a | | l | | | 
| Of | 02 03 | “O 05 06 07 08 09 | +10 +12 
Ur | | | 
” (1 i < | ao | = ok ina m4 7 
) | | 
A “(ly ‘=2 1—v,* “9999 “9996 | 9991 | *9984 | 9975 | 9964 | ‘9951 | 9936 | -9919 | ‘9900 | ‘9856 
* "1) | | | | 
a) =-2 | 1-v2 | -9999| -9996 | -9991 | 9984 | -9975 | 9964 | -9951 | -9936 | -9919 | 9900 | 9856 
Ji ( | | } 
fy’ (1) | v,\2 | | | 
‘7 (1) = -—3 1- ( 4) l “0000 | 1-0000 | -9999 | ‘9999 | -9998 | -9998 | -9997 | 9996 | -9995 | -9994 | -9991 | 
, } | | | | | | 
fi” (1) |, _(ee\t | Gee ares —_ 
‘=-1] I= ( 5) | 10000 "9999 | -9998 | ‘9996 +9994 | -9991 | 9988 | “9984 | ‘9980 | -9975 | -9964 


A qd ) | | | | 

| | | nG | | 
* The sign chosen in (7) and (8) must be the same as that adopted for (6) as stated in the text. 
t The sign to be selected again as in (6). 


‘9775 


“9986 


9944 


u 
ve |’ +304" 2B, = 02 ay” @Bo+ t 027" (28o—2Bi— 1 ) si qs 27d dy” (cB; —2B2 VBi— VB) 
y + 2a ay as By = wh Ug ry” (282 — xB, — 1) V Bi 
= a BEEP GecdnW yp tdexeceaeendaee dee casequs pad isekeleueacaGesuavie susan day oben Faeddau ses teuldse cobiseiweucswaed 
= D, Say TUTTE CCCI EP ICT TTT TT TTT Terrerrrrreerrrriei err 
y ” yf ‘ oy Ww y’ wy ay i y ¢ 
+1,r, ( r) -1}+4 I2Ds (: ) wil; ; 0,’ (2) aie 7" 
= = +* Cy » hk Mis oy ili Coy WB Faa%y Me Cy ws 
ow Zz 
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purposes—even with considerable values of v,. The practical statistician dealing 
with series of the customary size would have little hesitation in asserting a causal 
relation to exist between u and «. 
We will now suppose w is a function of the second variate y, or 
, y 
w= dy =f(1+ 4). 
We have expressions for @, w’ and o, exactly analogous to those for @, w’ 
and o, in (2), (3) and (6), except that f now replaces f,, v, replaces v, and the 
f’s of y those of «. In order to abbreviate the results we will use single symbols 
for the functional constants, namely 
” 1. 1 mr . i Aad 
» asi ( ) “- =e GO) ri . (1) 
Ji (1) fi QQ) tr Q) 
m hag 1 iy oe : viv 
AO) ym KO yy AO). 
Js (1) Je (1) J: (1) 


We may now write down 


’ zt wen’ 1S) ahs poem (2) — Bil + svat |(Z) - - .81} 
=+ x x ox 








ieee ra” yBat Hid al”? (yfBa— yBi— 1) + gy Vy da” Ae” (ys! — yo V yr — VyBs) | 


+40 rl” yBs — uy ry? (,Bo ~~ Ry — 1) VB, 


Multiplying (10) and (11) and taking the mean we get for the mean numerator : 


1 y a ” a’ \? 2 a\3 3. i {x \¢ 
N S E + 3Uz ry (=) = 2 + $02°r, (5) = he, + mr Vey" (=) —- A} 
y 1, val (2 y- 4 2 (x ) eke oh . iv y os 1 
x Ee QVy Nz "| a, 1 Vy Ae” = Vapi} + gp Uy? Az Cy yBo | 
1 a’ 3 a’ 2 / 
wi fed 4, ro ff their (¥))—: 
N O,0y Cx oy ox \Gy Cx 
, , 3 se So y 
Lu, ma”) (y’ . - = VB," tT nf (“) —V,B, “ 
Cy ox Cy Cx 


” a” S é a? fa 
+ Pogvy dys en ab yP : + 2.4 i} 





Le 


Ja x’ ( y’ 
sh v,P rj - : 
o; (S-. a. Tact“ (Oy oy 


/ 
‘3 9,/2 ee /2 w3 : 
$ aye ty hye ne td ££ V8} 


On Oy y a 


4 a! 
+3 Vz 5 - wi _ 
24 y' 7 = 


J 


i i ata) a? y* ‘ —) 
+ iy Vey My Ay ee ae VR, oc, oj* VB | . 


ox Oy ’ 
* The sign to be chosen as before. 
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Now in this expression nine mean products occur: 
1 S(y/a2*) 1 S(a’y*) 1 S(a*y*) 1 S(a’y*) 1 S(y’2") 
N Gyo, ’ N Oz0y | o20,2 ’N o,0,; ’ N Oyo,) 
1 S(ay') 1S(a'y") LS(@y") 4 LSey*) 
i N az'cy ’ N o,0,' ’? N o20,? a Zo, | 
These are multiplied respectively by vd", yAo”, VeVy Ay” Ao”, Vg7Ay”, VyPAo”, V2? Ma", 
Vy Ag, Vg? VyAy Ae” and vz0,7A,'rAo”’. The last four are certainly of a fairly low 
order, the first five of more importance. But it is clear that we cannot without a 
knowledge of the # and y frequency distributions proceed further unless we make 
some assumptions, which will provide at any rate approximations to these terms*. 
If we assume linear regression then : 
LS@y*?) = 
WV poy? =Tay V yP, 
1 S(a’y') : 
N ozo, =lry yPs > 
L S(e'y") 


N Cx oc, 


1 S(y/'#”) 
N «o,o,7 
1 S(y'x'*) 
N aycg' 
1 S(y'#*) 


NH aye 


=1zy V 2B, 
= lzy as » 


= lay yPo, 


If we assume in addition homoscedasticity : 
1 S(a?y”) 


N ee; 


=Try Pa. 


=1+ 3 Voy (2B. = yx = 2), 


} 18S (a? y’*) - 


T ° +s 
N a;20,; 


(1 — tgp?) VB: + Tay? yR’» 


“ =(1- Vey’) V 2B; = yy? eM 





The first four and the last two are all zero for normal distributions of « and Y, 
and accordingly will be small for moderate skewness. The seventh one for 
normality becomes 1 + 27,2, and the fifth and sixth 3r,,. 

With the above results we obtain for the mean values of VN, and D,D.: 
Mean (N,N) = rey + 4g M1" Vey VaBi +E 0y de" Cey VyBi 

+ G7” Py xB. + Ey? re” PayyBs + $V2Vy a Ae" Vay? (82+ yB2— 2) 

+ ete? Ma” rey aBa + gly Vy?De" Tey ye 

+ 50,7 Vy ry” Ag’ Tey? (Bs = V 1) +75 Ugly", Ag Tay" (yBs' i V yBs)- 

D, D, =1+ dugdy” VB a a EVyAy” V vB Feil $0.7” aBo+ Luger,” (2B. — 8,—1) 

} + $V p yy Ay” V PryR + $0, Ng” yBo + h dy? Ae’? (yB2— yBi — 1) 
+ ity Very Ay” (28s, = aBs Vii —V 2Bi)+ a4 0, ry" x . —y gry? (eRe = rBi = 1) V Br 
+ is Vy Aa” ro” (yBs sa yBx VyRi—V BR i)+ 24 Uy? do yBs — T 6 Uy Ay (yB: i's Ri = E>) VB 1 
+ hy Vg? Vy Ay!" a”” (Bo — Br — 1) VyBr + py Va? yy” do” VyB Bs 
+ gy Vey? i Ag” V Bi (Be — Bi — 1) + ere dy? Ay Ao” V By uBe- 


* See Biometrika, Vol. 1x. pp. 4, 5, 7. 
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Hence : 
1 ww’ 


Y 


Tuw N 
TyTw 
” a” ” aur o “ur 
PTxy + 4 UzrAy Voy V Bi i SVy AL Vey VB, + $02 Vey «Bes cs i Vy" Az Vay yBet 
ay Wy 4 3} iv » ¥: 9.3) iV > , 
+ $UzVyAy Ae Vpy? (a8. + yRB2— 2)+34 Uy Ay” Teyabs = 24 Vy oD ay yes 


_4* L+ ug tyra” re” Pay? (Bs — VB:) + Px Uy? Vea Ae” Tay? (yBs — V WB1) as 
‘i D,D, 
Expanding D, D, we find finally : 
Tuw = they [1 - §u,7A,"" (eB2— ai — 1) — hv, r."* (yB2— yBi — 1) 
+ vpty ya” {ray (Be + yBs— 2) — 4 VcBryBr| 
—Vy 2 yy” (eBs — «Be V8 ec V Ps) — as Vy Ae Ay” (yB: — yBe VR = V Bi) 
+ hvP dy? VB, (eBs— Bi — 1) + h0y?Ao? Vy, (yBo — Bi — 1) 
+ gy Va? Dy ye” V gBy (2V zBiyBi — Tey (Bo + yB:2— 2)} 
+ he Og Vy? Ay” Ae” VPs (2V 1B, 8, — Try (aB2+ yRs — 2)} 
yey hs”Ae (eBa VoB + Pay (MBs = 2B) 
es ty Sa eee ee ly ) eee (12). 
For the particular case of # and y having normal frequency : 
uw = £40 xy [1 — fuer? — 40,7 r" + $0, 0,4" dr." 1, + fourth order terms]...(13). 





For this special case we easily deduce 
Pow = Ei Pay (1 — (V1 = tug — V1 — Py} cee cece cece ee eees (14), 
and this probably gives in most cases a reasonable approximation. 

Table II (p. 217) indicates the magnitude of the percentage changes in the 
correlation of # and y which we obtain when we correlate functions of # and y 
instead of # and y; they are calculated on an average value of the coefficients of 
variation. It will be seen that the percentages reached are inside the probable 
errors of most correlation coefficients. Even if we raise the coefficients of variation 
to 8, or the order of v, and v, to ‘08, we shall then only have to multiply these 
percentages by four, and we shall still find they are of very small importance. It 
is not therefore reasonable to anticipate that in the great majority of cases any sub- 
stantial increase of correlation will be obtained by correlating functions of variates 
instead of variates themselves. 

Actual Numerical Illustration. It appeared to me desirable to illustrate these 
theoretical conclusions by an actual sample population. I accordingly tabled 1112 
cases of stature in Father and Son from Pearson’s family measurements. The 
correlation table is provided in Table III (p. 218). I then took the cubes of the 
statures as provided in Table IV (p. 219), using as my unit 1000 cubic inches. 


* In the above expression the positive or negative sign is to be taken according as fj’ (1) f,' (1) is 
positive or negative. 

+ Choice of sign must be the same as in the previous equation. 
t The choice of sign must as before depend on that of f;’ (1) f2’ (1). 





Correlation of u and w Functions of « and y respectively. 


TABLE II. 
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TABLE III. 


Correlation between Father’s Stature and Son’s Stature 


w, 


Son’s Stature 


On the Means, etc., of Functions of Variates 











: o 
3 SZSSSSSSSSSSS3233 
Sse 
a 
$39-E29 PI LILLE Lili tet ttt ids 
$29—-F99 PILI III II Lt le 881178 
se OO 8 
5 Saat ] 
testo PILI LIL LIBS lies | IPS 
~ 
rd j ae F ° 
L19-t69 FL IT lit di ise ies | i de 
_ ~— + 
- = +m 3D g-3 
t£9-+29 TI III i lil jee | |S 
¥ Ag AN A > 
he Ee ee oo 8 
ae is. Oe — 4 = 
- af 10 aft) a> PS 
—% | See Se ee wee ee ee 
t.I19-%.09 F | | 1d aoroeore als 
a ee + 
- = aD at ad a> (=) 
7 — > Yen) I~ N19 2. ro ro} 
LO9-FITSE | | id | bein ea | Pa 
eS No) 
— | > | > ~NAWONAG a 
TAITS—-LOTST | 1 | leer amognasne | @ 
Se NR —) 
= = = 
- 2 ps eae } 
& N i~ i ad GV ad 26 ad 
We TOS Ti ll la catonomeman |S 
SR AAAS = 
“ n 19191919 wD Ney O° 
£.68—78.6 | | | Oi AL Soon, S 
— en — tide Arnarortn ! = > 
SS NAN a 
Sere EL IE 8 oe ee 
rq ot x we a 8 
> , i a ig 2 - > 
or me | lnbhimmoumaaeana | ids 
SB aAMANS = 
> 1 10.10 19 19 =¥ 
, —} oO | Nes 10 Bad = I~ Bag XD AN | | — 
— rips SIDA OMmOOMO MAS = 
SAANAARS 10 
$9. G¥.,6 £ |* Aas seh os | | Ss 
—— ie HASUAm eww ae | 0 
a a 
a> Q > 
66-34 PDD 9 | l S 
TIGNES Pan aaraoooaa! | | lye 
<< e Co 
Lehre je SieoRebS, |] | PS) 
ie — OO HOON sare) 
- = a a 2 3 > a 3 3 im a> 3 Sg 
L.ES—-*,6.5 [RAE mera | an | | | | re 
‘ ANAM HAS 8 
. = Es Ee 
| techie | |S8Ree BReel I lil |= 
> 
= 
2 
— 
rm] 
ad 
a 





$..0.9—4.0¢ 





$0.24, ITY 




















ERSREA SD: 


TOI oto y 


CNS WSs St 
NM is ib iy iy i Ig i 





Pg 










‘A= aangryg sA0yyey 


"10"S—S5' 11" 


pr 
oe 


11"4—6'0"3 


eos 


Totals 








515+°015 


ay © 


7, 


5/865 + 06 


My 
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No correction for grouping applied.) 


(N.B. 


TABLE IV. 


Correlation between Cube of Father's Stature and Cube of Son’s Stature. 


Unit 1000 cubic inches. 


Uu. 


(Son’s Stature)* 
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The correlations as found by the product-moment method were respectively : 
Voy ="515 4 °015, uw = *514 + 015. 
Here A” =r." = 2, v,='03997, v,= "04014. 
Applying formula (13) we have 
Tuw = "515 [1 — } (04014) 4 — } (03997) 4 + 4 (04014) (03997) x 515] 
= 514, 


agreeing exactly with correlation table method. It is clear that in such a case 
nothing whatever is to be gained by using a function of « instead of « in 
correlating. 


(3) Frequency Distributions. We now pass to another matter which has con- 


siderably troubled certain statisticians and to some extent the trouble may be 
justified. Namely it is argued that if # follow a normal distribution it is not 
possible for «= ¢,(#) to do so also. This is clearly correct theoretically, but we 
have not yet met with an attempt to estimate the degree of divergence from the 
exact values when, knowing that « follows a normal distribution, we proceed to 
calculate the frequencies of u from a normal distribution also. We shall now 
investigate this point. 





First Method. From (4) and (6) we get* 


u a’ n((@s? sew (/a'\3 a’ — 
= 1 —)—V¥ — ee a | ia ee 
ee - E + 3 Ugry (5) 2B, Ce i} +§ 027m (=) 2Be o, eB} 
ee a’ \? , a! 
- $u2r, . {2 V Bi (( =| aie 1) + (282-328, — 1) 
Cx Cy 


= Pst2"" (zy (xBe ? 328, an 1) >" V By (328: 5a, —- 3)- By, + oP, + 1 


= Hee "(WB (E) + Ba (S) + 8! — 308. V Bi — VB) = —28,— Bik 


a’ a\* : ‘ ‘ 
— 40°" {08 + 7Bs - - ¢ ) 14 terms in v,' and higher powers .. (15). 
x z 


Differentiating (15) 

du’ da’ a’ == a’ \2 

me os dur,” Se ayn” 19 (~)_ 2 

+ E +40, {2 = V-B + h02r, {3 (=) A 
~ nem" {avaB, (E) +08. 346,—1} 
‘ the ‘ x’ “yD - 

as Fee is {2 (xB. — 3,8, — 1) igs V8; (3,8. — 5B, = 3)} 
ai yt)" 308, (¢ ) + 228. + 28s! — BeBe VaBi VB 


+ Her, {4 (=) = 6} + terms in v,‘ and higher powers ...(16). 
‘ az’ 


* See footnote on p. 214. 
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Squaring (15) 
u'\2 a \? Pe (, a \3 — fa'\? “ 
(5) : GS bi (5) — Nab a as =| 
oa’ \4 “\2 Pe 
+ $02," (=) — «By (=) = VB =| 
Cx Ox, Ox 


!y\4 — (a \* ‘ f w 
sie {E) wa (2) tas (ffosvad 


J Cx 
— Lygr,"? {2 <8, (2) +2 (eBe—4e8,— 1) (=) — VB; (4eB.— 8281) (<) 
— 2 (a2 — 48; — 1) + 2/61) 


+wenrar (2-02) een (2) 


Cx 


Ine 


— (eB ~ 448. VeBs) (=) +208, + xB) = + VB} 


x 
Cx 


3) iv a ’ a’ \? aw \*) 
ye .* V_? Ay 08: a = «Bs (=) - (=) f 
; % - 


+ terms in v,‘ and higher powers ................cseceeeerseeees Leaked (17). 


Hence raising (w/o,)* to the exponential and expanding the right-hand side 


we find: 
42) 
e \ u =@ x 


” ( d : 2 S % g be r’ ) 
1- $U2ry (2 ) = VB, ) ral | 
| Cx ox) 


Cx/ 


=) + («8; — 3) (=) +6v.B, (“y 


( 
+ (Bs 448, +2) (2) — av 8, = — 1! 


- ween ((2) 04a (2) +0,-0) (2) + el a9(2) 


+ (3282 — 2728; + 9) (zy — BV 2A) (8s — 4, + 12) (=) 


— (928.— 48,8, + 1) (2) +¥.8, (1298-248 + 15)(") 


‘ , 
a“ 
' x ox 


+ 3 (2,8. a 8228) ree 1) . a” 6. 
Ox 
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pug ay ()- VB, (=) — (eB2+ 2) (2) +v 8, (Ba a8.+1)(= ) 


+ (818,48 +1) (=) + (A) — 48. VB + VeB) (=) | 
— 2(28, + 2:) = -v Bt 


~ (fe ,(a\? x 
— g702*d,” 1) — «Bs (=) — Bo = 


+ terms in v,‘ and higher powers| wei alent ietraghedoanaadwenalnndenaamiens (18). 


Multiplying (16) and (18) we get 


, 2 ; , 2 » 
~% (: ) du =% (: ) da 
e ou =+e ws 


u x 


E — fu,r,” (zy - VB, (=) - 3 (=) + V8 


we {(% \° ry. wre ¥ af \? 
+ gun" {(Z) —2v.8, (E)' + 68,-1) (Z) +12v.8, (2) 
rid x «“ 


“ 


, a \? / a! P ) 
+ (28. — 608, + 6)(*) — 10V,8," ~ 8. +328) 
Cx Cx ) 
° wr t/a’ ‘ P a 2 a 
i § U2 (=) ‘it (xP2 + 3) (=) = VB He + A 


ee | (ey - vB, (2) +88 -4) (Z) +V/<6, (36 — 8.) ( zy 


ToS ( ) — 8V 28; (82 — 528; +31) (7) 


, 
“ 
Cx 


— (18,8, — 11,9, + 10) (2) + VB, (18,8,~ 45,8, + 54) (*) 


a 
Cx 
+ 3 (52B. ai 21,7; a 2) = _ 3V 2B, (328. = 5p) 


Cw» 


+ yon’ (2) —Vi8, (2) - 68.40 (2) + Va CB.+5) (2) 


(\ox 


av 
+ (685+ 8.4 4)(Z) + (8! —608,¥B) (7) — a8 + 88) ¢ 
— 8; +328, VB 


oy 


+ terms in v,* and higher powers| dekvobeestenssuniegeetiertetniaaenrmendl 
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For the particular case of « having normal frequency 


| | 
x < 
° 8 
yO, 
— “_—~. 
a ~ in] S> 
- 8. 8 i 
a oo 
a | 
| 7) 
| 
en, 
) CHES 
= 8 
9 ett 
ee. + 
4 oe 
ww 
oO 
7 SP 
Q/8 
8 
ell 
| 
co 
Ws 
Pie 
y |& 
~~ 
+ 
os 
od 
4 iy 
Py & 
~ a 
mom” 


oe 
5 
% 


rae (2) 10 (2) +22) -15(2)} 
- seen {(2)-4(Z) -3 2) 


Multiplying a into (20) and integrating this product between corresponding 


values of w’ and a’, i.e. uw, and wu,’ corresponding to «, and «,', we obtain (21). 


All the integrations on the right can be found by tables of the incomplete 
normal moment functions using m, in the significance of the book of Tables* 
supposing «, to be zero and m,, to stand for my (a). 


1 we! 5 (“) du! 
2a é ou 


J uy u 


> 


= + {m,—$v,dr," (2m, — 3m) 

+ doer? (15m, — 21m, + 9m, — 3m,) 

— huZr," (8m, — 6m, + 3m,) 

— Fy v8 dh)" (384m, — 576m, + 288m, — 128m, + 39m,) 

+ yuk A” (48m, — 80m; + 44m, — 15m) 

— 350°" (8m, — 8m, — 3m) 

+ terms in v,‘ and higher powers} ..............sseeseeeseeeees (21). 
(21) gives the frequency of normal distribution with regard to u, between u,' and 
u,, having its mode at the mean of wu, ie. vw’ =O+. True frequency in that interval 
being m), the total value of the remaining terms represents an excess beyond 
normality. This excess divided by m, and multiplied by 100 will give the 

* See Tables for Statisticians and Biometricians, Part I. pp. 22—23, Table IX. 
: + This is the usual manner of fitting a normal curve to a series of observations, i.e. we make the 


mean of observations and origin of normal curve coincide, and use the second moment coefficient of the 
observations (¢,,2) about that mean. 
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percentage of deviation from normality or as we may say percentage errors. Table V 
gives these errors for the special case of u = ka* and vz = ‘04. 
In Table V, the first two columns give the corresponding values of «'/o, and 
w/a, at the integral limits. The third column gives the exact frequency m. The 
TABLE V. 


Comparison of Exact Frequency and that of Normal Curve having its Mode 
at the Mean of u. Calculated by (21) for u=ka* and v,=0°4. 





















































| 


| 


{ 
| 
| 


a t u’ 
oy Cu tan . Combined 
| Mo Vy vy v, Results 
xy a, uy’ Uy! 
Oo, Gy Oy Fy 
— 03987253 | _ pacaor ae ae = eebie 
0— «1 Dacooaig | = "0398278 | +-00023836 | — -00015794 | — 00000145 | =-0399068 
— +0398725: : seal : ; 
ae deesuar. | 71914625 | +-00539574 | — -00061905 | — 00002843 | =-1962108 
= 4 72h: Rs sa a os 
0— 1°0 woe = 3413447 | +-01595770 | —-00064526 | —-00005532 | =-3566018 
— +0398725: sas : las 
o— 15 | ~ 1 p3eaniy” | ='4331928 | +-02243357 | — -00063140 | — 00004358 | =-4549514 
— 0398725: sai 
0— 25 | ~gvoneran | 74937903 | +-01963864 | — -o0096040 | +-00007187 | =-5125404 
en Cee errs Y Peer 
o—— -1 | ~ 3300293 | —-o398278 | --00023836 | —-00015794 | + -00000145 | = -0394330 | 
REE SEE. Ee ERE SEN AER APN CE 
— -0398725: . = ee bi 
0—- 5 is poe = ‘1914625 | — 00539574 | — 00061905 | +-00002843 | = -1854761 
— 0398725 " a 
0—-1-0 | See ig? | ="3413447 | — 01595770 | ~ -00064526 | +-00005532 | = “3247971 
_ | — 0398725: = 7 sa ‘ 
o—-15 | — sonst | =-a331928 | — 02243357 | —-00063140 | +-00004353 | =-4101714 
— -0398725: — alate iat et 
o—-25 | — 208723 | _-4937903 | —-01963864 | ~ 00096040 | —-00007187 | =-4731194 








fourth, fifth and sixth columns give the contributions of the terms in v,, v,? and v,;* 
to the total value of the integral 


1 Ww -4 ¢ y du’ 
——— ée * ou 
Vr J uy" Cu 
as set out in (21). 
The last column gives the combined results ; and the comparison is to be made 
between the m, column and the combined result in the last column. A comparison 
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of these two columns indicates that for many practical purposes a knowledge of the 
v, term would be adequate for correction of the frequency. But to this degree of 
approximation : 


we pe 
Qar J uy’ ou “V2r ay’ 


tpa [oe (2) aff. 


The first integral on the right is the probability integral equal to (in the notation 
of the Tables for Statisticians) $(1 + a,,) —$(1 + a,,). The term in square brackets 
is Vir, (a’/o,), Where 7; is the tetrachoric function of the third order and will be 
found tabled in the book of Tables for Statisticians, Table XXIX, pp. 42—51. 


Thus we have 
u’ 


o 1 fe e “4 eS du’ = 4 dl ty A,') — 3 el + a,,’) + V3 Vary” (7. (=) — Ws (=*)) 


VQ at,! ou Tx 
Lo a, 
ra (Pee (ssc (5 29 
2 3 (1 = Oy,') as 4 el — A,,') + NET (7, (=) Ts (=) ore (22), 
and the approximation can thus be found by use of the tables of the probability 
integral and that of the third tetrachoric function. Some but not so great a gain 
can be obtained by expressing the coefficients of v, and v,’ in terms of the tetra- 
choric functions instead of the incomplete moment functions. 
It is easy to obtain from 7, an appreciation of the maximum error due to this 
correction. 
If we take any two successive values in the first column of the Table XXIX for 
‘an! By, ‘ x ; ; 
(1 —a,,), $ (1 — a,,’) then 7; & ) Ts (2: ) will be given in the corresponding row 
x zZ 
in the t; column. Since the difference of the former two is constant the region of 
maximum difference in the 7; column will give maximum error. 
So long as the central region of the Table covers the difference, ‘00082 will be 
found to be the maximum. The corresponding values of } (1 — a) will run from ‘174 
U 
: 2 wiaet| : 
to ‘146 with mean of ‘160 for which — = 1°08. 
Cx 
Hence maximum error will be roughly 
100 V8.0,” x °00082/001 = By... say. 
Let us take again v, = ‘04, then we find : 
A. uake', r,"=2: | £,,,, | =80°/.. 
B. w=k Vz, mM” =—$: | Baa. | = 2°0 “lo 


C. et a,’ =-2: | ZF =80°/.. 


AX. 


| /= 40° ,° 





“u= k log a, he =-l1: 
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Lx , 
If we choose — = 0, so that we have: 
<£ 
V , 
§ (1 —a,,) = "5000, 7; E ) = —+16287, 
2 ma 
we must find the value of «,'/o, which gives the maximum error ; this means thao 
we must look for the maximum value of 


| : % ) + 16287} / (500 -3(i —«,,)}. 


We easily deduce that 
06304 + °16287 
‘500 —:073 


corresponding to #,'/¢, = 1°45, gives us the maximum. 


= 52°906, 


Hence max. error = 100 V3u,d4" x 52°906. 
Let v,=‘04. Then: 
A. ux ke’, r,'=2: max, error = 5'18"/.. 
B. w=kva, r"=—4: max. error =1°30°, 
Y i: ” ) Ps ~ ) 
C. uw=-, r,’=-2: max. error = 5°18°/.. 
LX 
D. u=kloga, Ay’ =—1: max. error = 2°59 


Further we can predict that in the case of different functions the percentage 


7? 


errors will be approximately proportional to their \,”’s. 
Difference of +, vanishes at the point 
$(1—a)= "0415, or a’/o, = 173. 


At that point the error will be very small. In the case of either “ tail,” the 
further we go, the greater the percentage error will be; thus there is no maximum 
in the case of “tail” evaluation. This is, however, of small importance as the 
absolute frequency in the tails becomes for practical purposes insensible. 


Another use may be made of Equation (22). It is clear that the tetrachoric 
term gives the fundamental correction on the normal distribution of w’/o,. But 
the corrections for any two functions will have a ratio depending on the ratio of 
their \,’s. Thus we see that for the same range of values of «,'/, to a,'/o, the 
correction might be obtained approximately by multiplying those obtained for any 
standard function (vz, »A:1’) by the ratio v,A"/(ov,, Av’), paying due regard to 
the signs of ,A,” and ),”. 


From (22) we deduce at once that 


pa u’? , SS a 

True Fre aney = I : . 4-5 dw a - I a 
rue Frequency = — @ %0 5 V30,2," (Ts = f 

NV Qar J 0,’ Cu Sil Oy/ Cx 
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But the double sign is determined by that of f’ (1), being, now that it is trans- 
ferred to the other side of the equation, the opposite sign to that of f’ (1). We may 
accordingly write the correction in the form 


| = fRLOIOE)--). 


This shows that as far as the factor in curled brackets is concerned the sign of 
Ji (1) is indifferent. The factor therefore depends solely on the sign of f,’ (1). If 
J’ (1) and the tetrachoric difference be of the same sign, the correction must be 
subtracted; if they be of opposite signs, it must be added. The sign of the tetra- 
choric difference depends entirely on the range of values selected for u,’ and wu,’ and 
so for a,’ and «,'; it can be positive, negative or zero, The sign of f,"(1) corresponds 
to an important physical relation. We have: 

Unean =i (1) + $02 fi” (1), nearly, 
“median = =f, (1), 

median = $x 2 fi" (1). 

Or, when /,” (1) is positive the mean is greater than the median and therefore 
greater than the mode (as in w = ka and w=k/«); but iff," (1) be negative (as in 


u —¢ 


Inean 


u=kWV« and u=k log) then the mode is greater than the mean. 


Second Method. The same result can be reached in a different manner. On the 
left-hand side of the equation (21), if we only know the values of w,' and wu,’ corre- 


, 


! sponding respectively to «/ and «,’ we can evaluate the above errors from the table 


of the probability integral simply. 
‘ , 
‘ es : Ju F x a 
The equation (7) gives the value of corresponding to — but it is better 
ou Cx 


calculated to slightly more accuracy for our present purpose. 


Retaining in (4) and (6) a few more terms we find: 


a 1 » (f(a? 1 ay ass eg a’ \8 / ' asi (fa! \4 \ ; i (/a’\3 ) 
nu! ‘ a, 78’ } Ca =i ss 6 Ua Ay ei = el + va Very ae } — 2B) +735 Ux AY (=) —2P; ( 


~ ( + Ugry wal ay + 10,2 a Bo + doer,"? (eB, -1)+ qe 2B + 0g” {aR — V Bit) 4 








+ a0,’ pcy 5 39 V_'Ay es (Bs ind a2) . gate? (28, ad aP) i 
be (23). 
In case of normal distribution of z this reduces to: 
« , ” v \2 9 wr x iv ) a S 
4 . + 4urry, \(: ) — 1h +4ver, (¢ ) + i Ve Ay (2 a) a 3h + + 735%, on (2 ) 
w 4 To ox Tx & <2 (23) bis 
ou Sis {1 a U7 Ay + $0,7r,"" — } U_'Ay" + 4 Ug ry’ _ = a Ux aI 
. From (23) bis we get Table VI. 


; ; 
. “  u x S 
Third Method. In the equation (23) corresponding values of — and — are inde- 
u x 
pendent of & and depend not on the magnitudes of ww, @, UW, or, o but on their 
: ee ae u ; ; - ; 
relative values, Le. —, —, , _ 3 therefore we can give k, %, o, special values 
Cy Gy Gu Tu 












. .u 
Corresponding Values of 
May: 
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TABLE VI. 


, , 
av 
and 
0 ox 



































Then by (2) 
by (5) 


wu 
B. w=k Va. 


u 


u 


ou = °2001405506 


oy, = (0004025869625 


u=f,(1) + $v,2f," (1) = 1004800 


120383°643 “+ 1004800 = (4 ” 4100 


ox 


Let k=1, #=100, o,=4. 
Then au = 9°997996987 . 


2001405506 “ + 9997996987 = af 4* +100 
z 


Cc. u= : 
Let k=1, «=100, o,=4. 
Then u = ‘01001607741 


0004025869625 - + 01001607741 =——$—_ 


| w | 

| Tu 

| x : | 

| gy | ae k 

| u=kx u=kVa e- u=k log « | 

ieee BAR ee bis Va pales | 

| +2°5 2°7096705 |  2°4487426 | —2°2980793 2°3951220 | 

+1°5 15468547 1°4871210 | —1°4459359 1°4720481 | 
+1°0 ‘9973448 | *9995075 | — °9952935 ‘9973660 | 

| + 3 | *4685687 | *5071829 | — ‘5269799 5134663 | 

| + 2 0602085 | 1098376 — *1388953 *1194652 

| O — *0398725 | 0100075 = - = 0399337 *0199839 

| — -1 | — *1391557 | — -0900224 | - -0598227 | — ‘0798962 

|} — i= 5283774 — *4921649 | 4669918 — *4834685 

| -1°0 | — ‘9973448 — *9994880 “9950392 — ‘9973022 

| —-15 | —1°4471734 —1°5121234 | 1°5455556 — 1°5219539 | 

| -25 | —2°2910090 = -2°5540191 | 2°7199667 | —2°6056007 | 

without losing any generality, if we preserve the same value of v,. Thus for 
v, = 04: 
A. u=ka’. 
Let k=1, «=100, o,=4. 


Sika caiah eae (24), 


o, = 04 x 3000000 V1 + ‘0016 x 2 + 0016 x 2 + 8 + 00000426667 
= 120383°643 


Cee ewer reese reseeseee 


) Aero (26). 


POO e twee eee e eres eeeerseserees 


ee ee eeeeeees 


eee ee eee ey 
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D. w=klog«. 


Let k=1, ©=100, o,=4. 
Then HB = 4 GOESGS260.....0.0.<05ccsccccsecccscscsseweees (33), 
ou = 0401 283337 Cece ccc res cecccceccceceseessceres (34), 
0401283337 ~ + 4°604368260 =log, (4 — + 100) ean (35). 


By (26), (29), (82) and (35) we find on testing exactly the same results as those 
in Table VI. Moreover by these equations the inverse problem, i.e. to find the 


a 


value of ~ = corresponding to the given value of * 5,7 cn be solved quite easily. 
u 
I next proceeded to compute Tables for the above selected functions as typical 
cases, taking no longer total integrals from #,’ = 0 up to some value 2,', but integrals 
over various short ranges shown in Column (1) of the several sections of Table VII. 
While the errors for Vx do not reach anywhere 2°/, and those for log # do not 
reach 4°/., those for # and 1/* may run up to 7°/, or in one case 7°5°/.. 


(4) «mffect of shifting the Origin of the Normal Curve for wu. An examination 
of Table VII indicates that while the normal curve for w gives a considerable 
number of percentage errors which are negligible from the statistical standpoint, 
there are others which we cannot overlook in practice. If we treat the v, term 
as the most important, we have from (20) 


LZ) hag gee” ((£) -3 (=))}- : H(z) & da (36). 


V Qa Ou | Vor 


Hence by aid of (7), assuming normal distribution for «, we have to the same 
degree of approximation : 


2 ey i fh ey wae (31), 











V Qer ou 


which is the true frequency. 


u’+h ’ 
3 L ° 
Now it is not feasible to take a function —— e— ee] and expanding 


= Oute 


in powers and products of h and ¢ to get by choice of h and ¢ a form like the above. 





, : , ingaaa (=) du’ a ; 
The expansion consists of the term @ “\eu = multiplied by a polynomial 
p Vaan Se p y a poly 


in w/c, and we have only two available constants h and ¢ to make the coefficients of 
{ "\3 u'\3 : . ; 

a, (=) and (“) agree with the above values. We can get rid of (w/ou)’, 
Cu u u 

but hot of both the linear and the cubic terms. There is no point in getting rid 
of one of them, as they are approximately of the same order when w’ and oy are 
not too different. Accordingly it is clear that we cannot find a normal curve 
which even to the term in v, will give the true frequency of uw’. One point, 
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TABLE VII. 


Percentage Frequency Errors, when we use « Normal Curve 
having its Mean at the Mean of u. 
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Percentage Frequency Errors, when we use a Normal Curve 
having its Mean at the Median of u. 


u = ka*. 
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however, is worthy of note. If we are dealing with numerical variations only, 
i.e. integrating u’ from — yo, to + you, then if terms in v,’ are negligible we have 


; = oe e : (=) =~ B. ws t cy “ Recepeneted (38) 


VQ) -x0n Cw V2@rJ x,’ 
or, a normal distribution does give the true frequency. 


In view of the fact that we cannot obtain a normal curve for u for which the 
coefficient of v, will vanish, it seemed worth while to investigate what improve- 
ment might result by shifting the origin of the normal curve for wu. We have 
so far tested for the origin of the normal curve at the mean of u ; other reasonable 
assumptions would be the origin at the median of w and at the mode of wu, 
without in either case changing the constant o,,. 


Table VIII shows the percentage errors which arise when we shift the origin 
of the normal curve to the median of wu, ie. to the point which corresponds in 
u to #'/o,=0. The general result of this shifting is to reduce in all cases the 
percentage errors of the central frequencies, + 1°5 to — 1°5, but the tail frequencies 
on either side are emphasised. In the case of kw* and k/x they are badly 
emphasised, 


Lastly I considered what would happen, if the frequency curve for w’ were 
treated as the normal curve shifted to the mode of u. 


In order to deal with this, we must first note that the true frequency curve is 


1 1 at = 
y= ie as ¢ ier. -nkwabagnsvmcneteeeewaoeves (39), 
and accordingly the true form of the frequency curve for wu is 
s 3.3 
7S 6 ( 2 ) day ee (40), 


where « will be a function of w. 


We will consider this curve for our special cases, determining the mode and 
modal ordinate. 


A. u=ka*. 


u } za 2 

H 4 _ me. k-3 vat(G)) 
ere = —— — k-4u-i 

U3 2m oe 


For our special case k= 1, ¢,=4. Differentiating, the modal value of w is 
given by the equation 
7 Uno + 32 = 0. 


§ 
mo 


Uu 


This leads to 
Uno = 990,399°966, tt’ no = — 144,00°033,  W'no/ou = — 119,61 79, 


and the modal ordinate 


Yun = -401,5067/o,. 
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B. u=kva. 


tfu\2 2 
/2 1 sofa ((®) * 
Here y=/ oe ue " i 0 | secdash-naemael (41), 


or, with the same values as before we have 


Uyo¢ — 100 2,7 — 8 = 0, 





leading to 
Uno = 10°003,996066, w’,,,, ="005,999076,  w’,,,/o. = 029,974315, 


and Ymo = °399,3024/o. 
C. w=k/a. 
We have y= _- ku e- : = z | 42 
y Von ‘ vl SS °F Sebbnleweuaswacseuews ene ( 2). 


This results—with the same same values of o,, & and Z—in 


32u,,.7 + 160%,,,, — 1 = 90, 


or, Um = 009,968203, wu’, = —'000,04787423, w’,,,/ou = — 1189165, 
and Yinw = °4028054/0,. 
D. w=k logs. 
elk —#\ 2 
Here ae ko ewke * = o,, ) . 


J ~ V20 Cy 
The equation for u,,, 1s 
eno — 100e"m — 16 = 0, 
leading to e% = 100°159,7448, which gives u,,, = 4606,766, u’,,, = °002,398106, 
W no/Fu = 059,76092 and y,,,, =°400,5420/on. 


mo 


The y,,) have all been expressed in terms of oy as unit, so that in the diagrams 
w'/o, may be taken as abscissa and the curves still represent the frequency. 

Table IX gives the percentage errors of the frequencies when we take the 
normal curve to have its origin at the mode of the u-distribution. 

Examining Tables VII, VIII and IX we see that the best results between 
«'/o,=—10 to +10 are obtained by placing the origin of our normal curve 
at the mode of u. Between a'/o,=— 15 to about — 2°0 and between 2'/o,=15 
to 20, the normal curve with its origin at the median gives a lesser percentage 
error in estimating the frequency, while in the range outside the limits a’/o, = + 2 
the normal curve with its origin at the mean of w is most adequate, although in 
the case of ka and k/a it will still lead to errors in evaluating these tails of 
4 to 5°/,*. 


In Diagrams I to IV the true curve of distribution of u’/o, is given for the 
four functions 
u=ke, w=kva, u= k/a, and uw=k log a. 
* Table VI indicates that the round values here given for 2’/c, may for practical purposes be taken 
to be those for u’/o,. 
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having its Mean at the Mode of u. 






Percentage Frequency Errors, when we use a Normal Curve 
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DIAGRAM I 
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In each case the normal curve with origin at the mean and standard deviation 
o is depicted. If the reader supposes this normal curve to be bodily shifted 
until its axis passes from the mean to the mode of wu, he will readily understand 
how the fit is rendered worse at the tails, improves in the middle region when we 
reach the median and is best round the axis when we reach the mode. 


(5) General Conclusions. 


(a) Formulae have been found enabling us to find the mean and standard 
deviation of a variable w which is a function of « in terms of those of a. 

(b) It is shown that the correlation coefficient of u and z is for all cases likely 
to occur in ordinary practice (i.e. the customary values of v,) very close to unity. 

(c) Given two variates u and w, functions respectively of « and y, an expression 
is obtained for the correlation coefficient 7,,,,, and it is shown that for the customary 
values of v, and vy, Tyw is very nearly ry. 

(d) Finally an investigation is made into the frequency distribution of wu, if 
that of 2 is known to be normal. It is shown that a normal curve for u fitted in 
the usual way by the mean and standard deviation of uw does not give very good 
results ; errors may run up to 7°/,*. Better agreement is obtained if we shift 
this normal curve bodily from the mean of the u-distribution through its median 
to the mode. It is then seen that the modal centering gives the best result when 
u'/oy lies between — 1:0 and + 1°0; that the median centering works best for the 
ranges w'/o,= 1:0 to 2:0 or —1°0 to — 2-0, while the mean centering gives the 
best results for the tails when w’/c, is greater than 2:0 numerically. 

(e) If we desire a frequency stretching from w’/o.=—y to + x, where y has any 
numerical value, then the mean centered normal curve for wu’ will be found to give 
reasonable results owing to the vanishing of the odd terms multiplying 2,. 


(f) It is always possible to calculate the exact value of the frequency of u, by 
determining from the function which wu is of «, the values of # which correspond 
to the required values of uw (as in Equations (26), (29), (32) and (35)) and then 
computing the known frequency of «, which will be that of the required range 
of u. 

(g) In most statistical investigations, however, one variate is not any actual 
mathematical function of a second. Thus skull-capacity and brain-weight are not 
mathematically determined by the product of three diameters of the head, and if 
the latter three follow normal distributions, the former two are just as likely to do 
so also, as not to do so, For such physiological functional relations, the mathe- 
matical corresponding functions only roughly shadow reality, and even percentage 
errors of 7 or more are extremely likely to be swamped by the mass of additional 
variates on which the given variate really depends for its variation, besides those 
we endeavour to comprehend in the “ mathematical ” function. 


I am indebted to Professor Pearson for his kindly advice and criticism during 
the course of this investigation, and to Miss Ida M*Learn for the preparation of 
the diagrams. 


* Perhaps 1°/, of the total frequency. 











ON A CRITERION FOR THE REJECTION OF OUTLYING 
OBSERVATIONS. 


By J. O. IRWIN, M.A., M.Sc. 


IN a previous paper* it was shown that if samples of a given size be taken at 
random from a normal population and if the individuals of each be arranged in 
descending order of magnitude with regard to some character, it is possible to 
obtain the frequency distribution of the differences between the pth and p + 1th 
individuals in such samples. 


For p = 1 and for p = 2, that is for the differences between the first and second 
and the differences between the second and third individuals, these frequency 
distributions were found to be very closely representable by curves of the form 

-y ferne—m 
y=ye ' * J, 
the range being «= 0 to =a. 

Diagrams were given from which it is possible to obtain the appropriate values 
of yo, h and = for samples of any size up to 1000 individuals. We are thus able 
to calculate from these frequency distributions the probability that the difference 
(1) between the first and second and (2) between the second and third individuals 
in a sample of given size should be greater than » times the standard-deviation 
of the population from which the sample is taken, where we may give » any 
numerical value we please. If this probability becomes sufficiently small it clearly 
becomes admissible to reject (1) the first and (2) the first two individuals as not 
belonging to the same homogeneous group as the remainder. Thus a table of 
this probability for varying values of X provides a criterion for the rejectiont of 
outlying observations. 

The frequency distribution of the differences between the pth and p+ 1th 
individuals (p = 1 or 2) is given by 


=~ tS 


* J. O. Irwin, “The Further Theory of Francis Galton’s Individual Difference Problem,” Biometrika, 
Vol. xvut. pp. 100—128. 

+ By rejection we mean the realisation of the fact that the particular observations in question 
probably do not belong to the same homogeneous group as the rest, and may therefore be left out of 
consideration in calculations concerning this group. If our observations are all observations of the 
same physical quantity we have a frequency distribution of errors and we realise that the outlying 
observations are due to some disturbing cause, possibly a blunder in measurement. But if the obser- 
vations are a series of measurements of some character in different individuals, the outlying observations 
are those of individuals who are anomalous with regard to the character in question. We merely reject 
them because we realise they do not belong to the group we are studying; for other purposes such 
anomalies may be of the highest interest and importance. 


Let the standard-deviation of the original population be a, let «= a'S, h =h’S, 
Xo = NE and let P(A) be the chance of a difference being greater than Ao. 


Then 


Let a’ +h’ =X. 


Then 


=/o and h/o being known when the size of the sample is given, it is possible to 
compute tables of P(A) from the tables of the probability integral. Table I gives 
the values of h/o and =/o for values of n from 2 to 1000. These values have 
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TABLE I. 


p=l p=2 | 

| 

| 

hie Z/o | hie r/o 

00 1°414 : = | 

“52 1:271 52 1:271 
1-29 1095 1°61 *878 
1°72 1-082 1°91 "822 
1-91 1-072 2-06 “802 

2-04 1:065 | 2°20 793 

2°13 1-060 2°30 “785 | 
2°20 1°055 | 2°37 782 
2°26 1-051 2°43 ‘779 
2°31 1°048 | 2°49 177 
2°35 1°047 2°54 775 

2°38 1°046 2°59 ‘774 | 

2°63 1°038 2°80 765 | 
2°76 1033 2°92 | 758 
2°84 1030 3°01 752 
2-90 1027 | 3:07 748 
2°95 1026 | 3°11 745 
3-00 1025 | 3:13 | 742 

3°04 1-025 3°15 | ‘739 

3-08 1-025 316 | 736 | 

«ele 1°0 ). | 
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been obtained from the diagrams published in the previous paper*, and by their 
aid Tables II and III have been computed. Table II gives P, (A), the probability 
that the first and second individuals should differ by more than » times the 
standard-deviation of the population from which the sample is drawn; Table III 
gives the same function (P,(X)) for the second and third individuals. 


It is clear that Table II provides a criterion for the rejection of one outlying 
observation. It is a matter of opinion, which the mathematician cannot settle, 
how small P(A) must be before it is justifiable to reject the observation, but 
about ‘05 seems reasonable while ‘01 is perhaps on the side of caution. If the 
first observation has been rejected the second may be examined in the same way, 
using (n—1) as the number of observations instead of n. Strictly % is the 
ratio of the difference between the first and second individuals to the standard- 
deviation of the original population. The latter is in general unknown and we are 
obliged to take the standard-deviation of the sample as the best approximation to 
it obtainable. In calculating the standard-deviation of the sample we are on the 
safe side if we include the outlying observation. 


If it happens that there are two outlying observations rather close together 
and then a large gap, we should turn to Table III and find P,() the chance that 
the second and third individuals should differ by as much or more than they do. 
If we find P(A) is sufficiently small, say less than ‘01, we are justified in rejecting 
the first two observations. 


Suppose now that there are p observations (p > 2) close together and then a 
large gap, we really require the probability integral of the distribution of differ- 
ences between the pth and p+ 1th individuals to deal with this case; but the 
numerical values of the constants of these distributions have not so far been 
determined. If however we remember that the probability of the pth and p+ 1th 
individuals differing by more than a given amount (in a sample of given size n) 
decreases with pt (until p=; —1 if n is even and ™ > : if n is odd) it follows 
that the value of P(A) obtained by using Table III when » is greater than 2 is 
really too large. Consequently if this value is less than say ‘01, the true value 
will be still less and the outlying observations may be rejected. There may of 
course be some border line cases such as would arise, for example, if we had p=4 
and P(A)="1 (as given by Table III). We should know that the true value of 
P(X) was less than ‘1 but we should not know how much less. In cases of doubt 
retention rather than rejection would seem the safer course. As it does not often 


* Biometrika, Vol. xvu. pp. 123—5. 
+ This has not been rigorously demonstrated. It follows easily from two assumptions. 
(1) That the mean difference between pth and p+ 1th individuals in a sample of given size decreases 


as p increases until P=5 —1if n be even and a if n be odd. 
(2) That the frequency distribution of differences between the pth and p+1th individuals is a 


J curve with range from 0 to. The numerical evidence that we have supports these assumptions. 
See Biometrika, Voi. 1. p. 397 and Vol. xvi. pp. 104—122. 
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happen that there are more than three or four outlying observations the above 
tables would seem to be adequate to deal with most cases that arise. 

It seems desirable to mention one or two other criteria which have been 
suggested for the rejection of outlying observations and to compare the results 
they give with those given by the one suggested here. 


(1) Peirce’s Criterion* gives a complicated formula based on the principle 
“that the proposed observations should be rejected when the probability of the 
system of errors obtained by retaining them is less than that of the system 
obtained by their rejection multiplied by the probability of obtaining so many 
and no more abnormal observations.” Apart from the artificiality of the principle 
the proof of Peirce’s criterion reproduced in Chauvenet’s astronomy “nearly in 
the words of its author and with only some slight changes in notation” seems in 
several places open to criticism. It is well-known to have been severely criticised 
by Airy. 

(2) Chauvenet gives a criterion for the rejection of one observationt. Pointing 
out that the probability of an error less than ¢ times the probable error of a 
single obse ‘vation is 
@ (pt) =>. [" erat 

p VaorJo ; 

‘67449 
~ Ve 


where = 47694, 


he shows that it follows that in m observations the number to be expected 
numerically greater than ¢’ times the probable error will be 


m[{1—@(pt’)], 


Q pemmor'e _ 3 
mi1i—- | e “dr. 


or in modern notation 


V2me0 J 0 
Chauvenet then says “ If this quantity is less than 4, it will follow that an error of 
the magnitude rt’ (r being the probable error) will have a greater probability 
against it than for it, and may therefore be rejected.” He therefore rejects those 
errors for which 

m|1—© (pt’)| < 4, 
2m —1 
2m 

But the above argument is clearly wrong for m[1—©(pt’)] is not a probability 
but a number, e.g. if m= 100, @(pt’) = }, 

m {1 —® (pt’)] = 50. 
What Chauvenet really does is to reject errors greater than t’ times the probable 
error where t’ is such that only half an error is to be expected greater than this 


or © (pt’) > 


* W. Chauvenet, A Manual of Practical and Spherical Astronomy, Fourth Edition, Vol. 1. p. 558. 
+ Cambridge Astronomical Journal, Vol. tv. p. 137. 
+ W. Chauvenet, A Manual of Spherical and Practical Astronomy, Fourth Edition, Vol. 1. p. 565, 
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limit. But the choice of one half in this connection rather than any other proper 
fraction seems quite arbitrary. If one observation has been rejected by Chauvenet’s 
criterion it may, he says, be applied to the next giving m a value less by one than 
before. 


(3) A simple method that is often employed is to find from the ordinary 
tables of the probability integral the probability of an observation so divergent as 
the outlying one occurring at all. For example suppose we find in a series of 
1000 observations one which is greater than the mean by 3°5 times the standard- 
deviation and that the one before it is greater than the mean by 3°0 times the 
standard-deviation. We find from the tables of the probability integral the chance of 
an observation occurring more distant from the mean than the mid-point between 
these two:—that is 3°25 times the standard-deviation from the mean. That is 
‘0006, or in 1000 observations we should expect ‘6 of an observation to be so 
distant ; and we should not be justified in rejecting it. But if the outlying 
observation were four times the standard-deviation from the mean, noting that the 
probability of a deviation greater than 3°50 is ‘0002 and greater than 3°75c 
is ‘0001, we should incline to rejection, while we should certainly reject an obser- 
vation whose deviation from the mean was 4°50 seeing that the probability of a 
deviation greater than 40c is 00003. 


This test may well be used to supplement the criterion we have suggested in 
this paper. All these criteria are of course based on normality, they would not be 
applicable where the material is very different from the normal; but they will 
apply in most cases to errors of observation and to anthropometric measurements. 


It must always be remembered that caution is necessary in dealing with small 
samples as the standard-deviation of the sample is then an unreliable measure of 
the’ standard-deviation of the original population. 


We now proceed to discuss some examples: 


(a) We take from Chauvenet’s Astronomy* the following fifteen observations 
of the vertical semi-diameters of Venus made by Lieut. Herndon in 1846. 


Deviations from the Mean. 


— 0°30 — 024 — 140 0°18 

—0 44 +0 ‘06 —( ‘22 0 °39 
1 ‘01 +0 ‘63 —0 05 +0 10 
0 *48 —0 ‘13 +0 °20 


Peirce’s criterion is applied by Chauvenet and leads to the rejection of two 
observations, — 1-40 and 1’"01. 


We have ao = 0"°5326, 
2m—1 29 
- = = “( ) 
2m 30 9667. 





* Fourth Edition, Vol. 11. p. 562. 
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Thus Chauvenet’s criterion would lead to the rejection of observations numeri- 
; cally greater than ko, where 
2 ko _ 142162 
ee | e~ 810" day = -9667, 
( 
or if, as usual, 


*he sos 
$(1+a)= = | e 2 lo dx, 


we have a, = 9667, 
$(1 + a,) = 9833, 
ko =2:130 
= 1"13. 
This leads to the rejection of — 1-40. 
For the remaining 14 observations 
ao = 04048, 


Q2n—1 27 ‘ 
ane =. = ‘964: i 
2m 28 os 
Or we have ay, = 9643, 


4 (1 + a) = 9821, 
ko = 2100 = 08501. 
This leads to the rejection of 1-01. 
On the repetition of the process it is found that no more observations can be 
rejected. 
Now let us apply our own criterion. 
Difference between first and second 
= 1°01 — 0°63 = 0°38, 
Difference 38 
@ 3826 


Table IT gives P, =°241. 


= 713. 


Difference between last and last but one 
= 1-40 — 0-44 = 0-96, 
Difference ‘96 
co 5326 
Table IT gives P, = ‘014. 
This seems to indicate that — 1”°40 but not 101 should be rejected. 


= 1°802. 


We may note that 
3 (101 + 0°63) _ 68282 
o ~ @. 33 


=154 4(1—a)=-062, 


4 (17404044) 0792-92 
| $—_________ — ——- SS = = “7 1 _— =" 2 
| o ae oe 
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or in 15 observations we should expect 

‘93 observations > 082, 

63 a < — 0°92. 
These results would not by themselves justify the rejection of either doubtful 
observation but, as we have seen, the greatness of the gap between —1"°40 and 
the next observation would seem to lead to its rejection. 

The fact that these tests would lead to the retention while Peirce’s and 
Chauvenet’s criteria would lead to the rejection of the observation 1°01 probably 
indicates that the latter reject a little too easily. We must always remember 
however that in dealing with such a sample as 15, the standard-deviation of the 
sample is a very unreliable measure of the standard-deviation of the original 
population. 

In fact the distributioa of standard-deviations in samples of 15 taken at 
random from a normal population is given by the curve 

15a 
y=yu"e “Qoz * 
o being the standard-deviation of the original population and the range being 
from #=0 tow=m. 

Now it is easily deduced from this curve that the probability of a standard- 
deviation in a sample being between 0 and « is 

1 a 
rx (7) where X = ae 


“T(7) Qc7’ 


and [y(7) is the incomplete [ function y e~X X* dX. 
o 

From the Tables of the Incomplete V-function it is now easily deduced that 
there is a probability of ‘95 of the standard-deviation of a sample being greater 
than ‘664¢ and likewise a probability of °95 of it being less than 1°25o while the 
probability of it being less than o is 6220+, so that the unknown standard- 
deviation of the original population may well vary between 15 times and 0°8 
times the standard-deviation of the sample. 

If we had taken the former figure we should have found P,=-075 for the 
observation — 140 which would have brought it into the region where rejection 
is doubtful. If we had taken the latter figure in testing the observation 1'01, we 
should have found P,; =*162 which would still not justify us in rejecting it. 

In general if the standard-deviation of the sample be too low we may well reject 
observations we ought to retain, if it is too high the opposite will be the case; it 
is in fact more likely to be too low than too high, so that caution is required in 
rejecting observations from a small series. At the same time an astronomer long 


* Biometrika, Vol. v1. p. 10. 
+ The limits -664¢ and 1:25¢ are not very different from those which would have been obtained by 
using the ordinary formula -67449¢/ \2n for the probable error and taking 2} times the probable error 
on either side. For we have o =*5326 + ‘0656 and the corresponding limits are ‘69o and L-31c. 
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practised in observing has a wider experience of variation than that afforded by 
an individual sample of this kind. His opinion on the question of the rejection or 
retention of a doubtful observation in astronomical work should therefore carry 
some weight. 


(8) Let us take the following capacities of 17 male Moriori Crania*. 


Capacities in cubic centimetres. 


1230 1360 1380 1445 1630 
1260 1364 1410 1470 
1318 1378 1410 1540 
1348 1380 1420 1545 
We have Mean Capacity = 1405-2, 


o= 8414 97+. 
Let us test 1630 $ 
1630-1545 85 
841 841 
From Table IL taking n=17 we find P,=-114. This does not justify the 
rejection of the observation 1630. Taking the probable error into consideration 
in about 90 °/, of cases the standard-deviation of the original population will lie 
between 59°9 and 108°3. If we take the lower limit 59°9 we have 
85 
59°9 
w value which is on the border line for rejection; but the value 599 being 
improbable, even taking into consideration only the information afforded by the 
sample, we are justified in concluding that the observation must be retained. 


=1011. 


= 1419, P,=-037, 


Using Chauvenet’s criterion we have: 
_2m—-1_ 33 

~~ om ~ 34 

1 (1 + a,) =98529, 
ho = 21780 = 183°2. 

Accordingly this would lead to the rejection of observations 

> 1405°2 + 183°2 = 1588'4, 

< 14052 — 183:2 = 1222°0. 


= 97059, 


Thus Chauvenet’s criterion would lead to the rejection of 1630 which confirms 
our conclusion that it rejects somewhat too easily. 


* Biometrika, Vol. x1. p. 134. 

+ Using the ordinary formula for the probable error which, as we have seen in the last example, is 
accurate enough for our purpose. 

+ We have chosen this example not because we think a craniologist would think of rejecting the 
observation 1630 but to show (i) that the suggested criterion gives a result in accordance with the usual 
practice of the craniologist and (ii) that Chauvenet’s criterion rejects somewhat too easily. 
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Using method (3) we have $ (1630 + 1545) = 1587°5, 
1587°5— mean _ 1587°5- 14052 _ 
foi heen 841 0—~—éi“‘i‘CS 
whence $ (1 —a) = 015, 


17 (015) = °255, 


9 


“I 


which does not justify rejection. 


(vy) As our last example we take 423 observations made by Dr R. A. Houston 
on the Colour Vision of male students, the Rayleigh test being used*. The variate 
is here the logarithm of the ratio of the intensity of red to green which appeared 
to the subject to be necessary in order that the combined colour might match 
a given yellow. 





| Logarithm of Ratio , y Logarithm of Ratio 
| of Red to Green (.r) | Frequency | or Red to Green (x) Frequency | 
— 2-93+ 1 33 9 | 
1°78 l — ‘28 37 | 
— 1°68 l 23 77 | 

-1°63 1 ~ +18 112 

~ 1°38 } Ll - ‘13 94 
} “O8 38 | 
- 73 ; 2 03 14 | 
— ‘68 3 02 8 
- 63 3 ‘O7 4 

“DS 5 ‘12 l 

— $3 2 17 | 

~ 48 4 

- 43 | 0 “62 | 

“BR | 2 ‘87 | 

1-07 

Total 423 








There are five cases at one end of the scale and three at the other which 
appear to be anomalous; let us test these. We find 
% = — 20305, 
o= ‘24316. 
Taking the difference between the observation ‘62 and the preceding one and 
dividing by the standard-deviation we have 
62-17 “45 
ee Se 8H], 
24316 24316 
We have not a table of probabilities for differences between third and fourth 
individuals, but we know that the values given by Table III will be too large. 
A glance at this table is sufficient to assure us accordingly that the probability 


* Proc. R. Soc. 102 A, 1922-23, p. 353. 
t Le. >(-2°95) and < (- 2-90). 





a 
f 
f 
i 
t 


oe ee 


SS 





= 


J. O. Irwin 249 


of such a difference arising purely by chance in a sample from a normal population 
is so small that it falls outside our table, that is to say it is less than ‘0005. 
Thus the observations °62,°87 and 1:07 would be rejected and we can conclude 
that these three people have some peculiar anomaly of vision. 
To test the observation — 1°38 we have 
138 —-73 65 
= = 2673 
o 24316 ania 
so that the same conclusion applies a fortiori to the five observations at the 
beginning of the data. There is no doubt that all the eight outlying observations 
are here anomalies, 
Chauvenet’s criterion leads to the same result. 
2m—1 845 
Ve have si = 2*° _ -99882, 
We have % 2m 846 , 
(1 +a) = 99941, 
ko = 3°240 = ‘788. 
Hence we must reject observations 
> (— 203 +°788)= 585, 
< (— 203 —°788) = — ‘991, 
which leads to the same conclusion. 
Method (3) gives the following values of $(1 —). 





| Deviation from Mean | 





Observations = | 4 (1-a) 

~ 2-93 ~ 11-23 | 146x 10-29 
—1°78 - 6°50 4°00 x 1071! 
1°68 — 6°09 5°64x 107!” 

— 1°63 — 5°88 2°05 x 10-9 
-1°38 — 4°85 6°17 x 107" 

+ °62 3°37 3°76 x 10-4 
87 4°40 5°41 x 107% 

1°07 9°22 8°95 x 1075 


It is clear that we must reject all these observations except — 138 and +62 


which perhaps require further testing. 


Taking the next observation into consideration we have 


4 (— 1:38 —-73) =— 1-055, 


Deviation from Mean _ °852 
Co - oc 
5 (1 —a@) = 00025, 
and the observation must be rejected. 


Biometrika xvi 





= 3°50, 
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To test + “62 we have 
$ (17 + 62) = 395, 
4 — (— ‘20: “5S i 
395 — (—°203) _ 598 _ 2°46, | 
o o 
4 (1 —a) = "007, 
or in 423 observations we should expect three greater than 395. How many 
should we expect greater than ‘5 ? 


5—(— 208) _ 703 


= 2°89, 
oC oC 
4 (1 —a) = 002, 
or we should expect °8 observations >°5, 
and similarly “4 ” > 55. 


Thus this test considered by itself might lead to the retention of the obser- 
vation ‘62 if we did not take into consideration the great gap between it and the 
previous one, ‘17. But, allowing for the gap, it should, we conclude, be rejected. 











NOTES ON AN EXPERIMENTAL TEST OF ERRORS IN 
PARTIAL CORRELATION COEFFICIENTS, DERIVED FROM 
FOURFOLD AND BISERIAL TOTAL COEFFICIENTS. 


By ETHEL M. NEWBOLD. 


I, 


IN many biometric and social investigations we are obliged to deal with 
variables which cannot be expressed quantitatively, and yet nevertheless partial 
correlation is necessary to unravel the skein of their interrelations. The total 
correlation coefficients in these cases can only be found by using biserial, tetra- 
choric or other fourfold r’s or coefficients of contingency, so that the question 
arises, how far are we justified in drawing conclusions from partial coefficients 
based on total coefficients of this kind? The present note describes an experi- 
mental test of this question, which was made for the Committee of Industrial 
Health Statistics of the Medical Research Council. The greater part of the 
arithmetical computation has been done by my fellow workers on the staff of 
that Committee—Mr E. Lewis-Faning, Mr J. Martin and Miss C. Thomas—and 
our thanks are due to Dr Major Greenwood and Dr L. Isserlis for helpful 
suggestions and advice. 





In these tests we have confined ourselves chiefly to tetrachoric 7’s but have 
also included some biserials. As regards the values of total tetrachoric coefficients, 
tests have previously been made on approximately normal distributions by Pro- 
fessor Pearson* and also by the late W. R. Macdonell+. The agreement with 
product moment values was good. Professor Pearson found also that the probable 
error of tetrachoric r increases with the distance of the dividing lines from the 
mean, but not very rapidly, and that the probable error of the tetrachoric r is 15 
to 2 times the probable error as found by the product moment method, and also 
that it is not necessarily the case, in various positions of the dividing lines, that 
the smaller the probable error, the more nearly will the tetrachoric agree with 
the product moment coefficient. Deviations between the tetrachoric and product 
moment values arise from want of normality and not from errors of sampling 
(except in so far as these may affect the normality of the sample), hence the 
question of a suitable criterion for judging the reliability of these partial corre- 
lations needs consideration. 

It seems clear first that, since the ordinary method of partial correlation is 
equivalent to correlating deviations from regression straight lines, and so involves 


* Phil. Trans. A, 195, pp. 1—47. 
+ Biometrika, Vol. t. pp. 177—227. 
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total coefficients found by the product moment method, we must for our present 
purpose regard our coefficients as approximations to the product moment values, 
without in any way disparaging the claim that any total fourfold or other r may 
have to be considered as in itself a reasonable scale of association; such as, for 
example, Professor Pearson has suggested for his fourfold 7 deduced from “equality 
of improbability*,” and also for his Q, coefficient +. 


We have, therefore, in these tests found the differences between the partial 
r’s derived from totals calculated by tetrachoric or other methods and the product 
moment partials. We might then have compared these differences to their 
probable errors (or, rather, to approximate expressions for the latter, since they 
involve the unknown, but presumably high correlation in errors between product 
moment and tetrachoric 7’s derived from the same sample). This criterion does 
not however, for reasons given below, seem to give the most direct answer to the 
practical point at issue. A rather simpler way of putting the question is to ask 
whether an observer A, who bases his judgment on the product moment partial 
and its probable error, is likely to come to a different practical conclusion from B, 
who bases his judgment on the tetrachoric partial and its probable error. Since, 
however, the deviations between the two kinds of partials from the same sample 
depend primarily on the departure from normality in the distributions and not 
on sampling errors, and cannot be got rid of in a skew distribution by any multi- 
plication of observations, it is, perhaps, more useful to consider the deviations 
observed in these tests either absolutely or relatively to the value of 7, quite 
apart altogether from the probable errors of the particular data used here. Then, 
so far as the results may be regarded as typical of the sort of distributions 
examined, these tests will serve to give some idea of the order of the average 
differences between partials from the different methods to be expected in such 
distributions, i.e. the average displacement of the mean value about which the 
sampling errors fluctuate. Whether or not this displacement is likely to be of 
practical importance in any proposed case is a question to which a general answer 
cannot be given as it depends on the sampling error of the tetrachoric partial 
found in that particular case, and the probable errors of the values in our tests, 
depending as they do on the size of the particular sets of data chosen, are quite 
irrelevant. The probable errors given in the tables, therefore, are only meant as 
illustrations of particular cases and do not affect any general conclusions. 


In a practical case, of course, since the tetrachoric would not be used if a 
quantitative classification were available, we have no means of knowing the 
distribution and may be dealing with a very skew surface; hence in our choice of 
material we have purposely included some variables which are very far from 
fulfilling the assumptions made in the approximate methods, and in one case give 


* «On a Novel Method of regarding the Association of two Variates classed solely in alternative 
Categories.” Math. Cont. to the Theory of Evolution XVIII. Drapers’ Co. Research Memoirs, Biom. 
Series, vir. 1912, p. 4. 

+ ‘*On the Correlation of Characters not quantitatively measurable.” Math. Cont. to the Theory of 
Evolution VII. Phil. Trans. A, 195, 1900, pp. 17 and 18. 























very artificial distributions, so that the deviations found in that case may be 
looked on as extreme. The other source of large deviations—extreme divisions— 
can always be recognised, hence as Professor Pearson has already illustrated it* 
we have avoided this as far as possible, and taken the dividing lines in every case 
as near the median as the data allowed. 


Three sets of data were used for the experimental tests. 


Test I. 


are about as favourable for the tetrachoric and biserial methods as they are likely 
to be in any practical case involving five variables. 

Table I gives the results in detail and the values of 8, and 8, for the five 
variables, The average absolute differences are : 


II. Description or TEstTs. 


The material chosen for the first test (taken from “A Biometric 
study of the Inter-relations of ‘ Vital Capacity,’ Stature, Stem Length and Weight 
in a sample of healthy male adults” by Cripps, Greenwood and Newbold, 
Biometrika, Vol. X1v. p. 327), practically fulfils the assumptions made in both the 
approximate methods, as the approach to normal distributions in all the variables 
is fairly close (with the exception of a slight skewness in weight, and to a lesser 
degree in chest), and the deviations from linearity are small (see p. 327, loc. cit.). 
The position of the points of division also was taken very near the mean in each 
case, and the number of observations (950) is respectable, so that the conditions 
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Partials 





Totals 
Ist order ... 
2nd order ... 
3rd order . 


we Biserial | and | and Product and Product 
Biserial Moment Moment 
| | 
035 =| 014 | — - — | 
056 ‘019 024 | 018 013 
‘071 ‘021 035 | 025 O15 . 
‘O79 | ‘047 ‘032 } ‘027 “024 


| Tetrachoric | Tetrachoric Biserial 











The largest deviation is —-175 for the tetrachorics and ‘06 for the biserial r’s, 
hoth in the 3rd order coefficients. 

There seems to be no very definite tendency in any of these methods to 
deviate from the equal distributions of positive and negative differences, and the 
final order of accuracy of the methods is as would be expected : 


(1) 
(2) 
(3) 
(4) 
(5) 


The deviations of the biserial from the product moment are on an average 
rather less than half those of the tetrachoric. 


Biserial and Product Moment. 
Biserial alone. 
Tetrachoric and Product Moment. 
Tetrachoric and Biserial. 

Tetrachoric. 














* loc. cit. p. 42. 
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If we consider the particular case of this sample, there is only one case 
—Tvs,13,—Which might lead to any practical difference in conclusions based on 
the respective r’s and their probable errors*, which is a very favourable result 
since the number in this sample is large, and with smaller samples deviations of 
the order observed here would be of still less practical importance compared 
with the sampling error. 


Test II a and b. The material used in the second test was taken from “A 
Study of Index Correlations” by Brown, Greenwood and Wood, published in the 
Journal of the Royal Statistical Society, Vol. xxv. (1913—14). (See Tables IT a 
and IT b.) 

TABLE Ila. 


1= Births, 2= Population, 3= Deaths. 


Number of Observations = 1000. 











| Sub- | —s | | Tetrachoric Qs 
| seripts Me sia t Tetrachoric Qa | — Product — Product 
| of r nee | Moment Moment 
| 12 935+ -003 | 969+ “005 ‘978 | ‘035 043 
| 13 ‘733 + 010 ‘938 + ‘008 941 | 204 -208 
23 "782 + ‘008 | “959 + 004 ‘965 | ‘176 183 
| | | 
| 12.3 "852 + 006 ‘7134021 | | ~-139 on 
13.2 011 + ‘021 118+ 042 | ~_— “108 = 
23.1 tas 14 "182 — 
| 


*400 + °018 | 583 + 028 





Values of B, and B,. 


Bi Bo 
Population “089 2-023 
Births... 337 2-665 
Deaths ... 16°020 49°819 


This material was purposely chosen as being very unsuitable for either tetra- 
choric or biserial methods. The distributions are artificially restricted at both 
ends by excluding districts with extreme values of the population, and they are 
also in some cases extremely skew. In spite of the theoretical inapplicability of 


* The standard deviation of a partial r derived from coefficients found by any method whatever is 
given approximately by an equation of the form 


Crag * F, — o..5 x Fy = O45 x Fs + F705 x F, + [dri2 5713] F; + [6rj2 5723] Fe + [aris 5793] F, ’ 


where the F’s are all functions of the r’s (see Dr Heron’s note Biometrika, Vol. vit. p. 412). If therefore we 
assume that the mean products [drj2 57,3] etc. of the tetrachoric 7’s are in the same ratio k? to the mean 
products found by the product moment formula (Phil. Trans. A, 191, pp. 229 et seq.), as the squares of 
the tetrachoric standard deviations are to the squares of the standard deviations found by the product 
moment formula, then it follows that the standard deviation of a partial tetrachoric is k times the 
standard deviation found by the product moment formula. In our data the average value of k for the 
standard deviations of the total coefficients is 1°85; we have therefore, in order to obtain a rough 
approximation to the standard deviations of the tetrachoric partials, adopted the above assumption and 
taken k as 2. 





Experimental Test of Errors in Correlation Coefficients 


TABLE ITD. 


1= Birth-Rates, 2 = Population, 3= Death-Rates. 


Number of Observations = 1000. 


Sub- 





int: Product Biserial | — t | 
scripts 1serla — Froauc | 
of r Moment | Moment 
nis ay 7 
12 | 1384-021 | 178 | o1 | 

13 | — "015+ 021 | ‘O19 | “034 

23 "148+ 021 178 | “030 

23 | ‘1414-021 | 178 «=| 037 

13.2 | — 0386+°021 | ‘O14 050 
23.1 151+4°021 | V7 ‘026 | 

Values of B, and By. 
By Be 
Birth Rates 312 5897 


Death Rates 61°539 131°137 


the methods, and the fact which is pointed out* by the authors of the paper 
from which the data were taken, that the material is eminently unsuitable for a 
basis for practical conclusions, both methods come out of the test well, if we 
consider the relative errors. Owing to the nature of the distributions, the value 
of biserial x for population and births came out greater than unity. To test the 
biserial method we therefore used birth-rates and death-rates instead of births 
and deaths, these also gave very skew distributions (see Table II b). 


The average absolute differences here of the tetrachorics were: for the three 
totals ‘138, and for the three partials ‘143; these are considerably higher than 
those in the normal material, but the values of r are here so high that the differ- 
ences are of no practical importance. The average absolute differences between 
the biserial and product moment values were: ‘035 for the three totals, and 038 
for the three partials; they are large relatively to 7 but insignificant in this case 
when the probable errors are taken into account. 

Test III. In both Tests I and Ila the total correlation coefficients found by 
the product moment method were large. 


The data for the third test were taken from 8 diametric measurements of the 
female pelvis made by Dr Emmons and used by Dr de Souza (Biometrika, Vol. 1x. 
1913, p. 486), since, in contrast to those used in Tests I and II, the product 
moment coefficients of correlation were fairly small, only 4 out of 28 being larger 
than *50. 


* loc. cit. p. 325. The authors’ object was to compare the results of two arithmetical processes 
when the nature of the material was such that a more extreme divergence than was likely to arise in 
real statistical practice could be anticipated. The paper was not intended to be a contribution to the 
study of the relation between birth-rates and death-rates. 
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The 28 total coefficients of correlation were found by Dr de Souza for every 
diameter taken with every other diameter in turn. The total correlation co- 
efficients used here were taken as given in the paper quoted above (see Table ITI). 
They are only given there to two places, but it was not thought worth while to 
recalculate them. In order to avoid introducing further errors, the tetrachorics 
were found to three places and the final results are shown here to three places, 
though not accurate to more than two in the case of the product moment 
coefficients. As before, the dividing lines were taken as near the median as 
possible, but in the case of some of the variates, e.g. 3 (x,), 6 (a) and 8 (a), the 
division was necessarily very unequal. (The class frequencies in these cases of the 
216 observatioas were: for #,, 69 and 147; for #,, 82 and 134; for #,, 83°5 and 
132°5.) 

The frequency distributions might all be samples from normal distributions 
since the number of observations is only 216, but the results for our purposes can 
be regarded as typical of deviations from distributions whose skewness is described 
by the observed values of 8, and f, considered apart from their probable errors in 
this particular case. 


The average absolute deviations of the tetrachorics are: 
Totals 065, 
Partials, 1st order ‘086, 
. 2nd order ‘099, 
- 3rd order ‘118, 
and the largest individual differences of the third order are ‘257 and —-253. 
These are, as might be expected, rather larger than those in the more normal 
material of Test I, and as the third column shows, the relative errors are in some 
cases very big. In any case, however, where the probable errors are of the order 
of those in this particular sample, the deviations due to the approximate method 
would not be larger than those due to random sampling. 


In the material of Tests II and III a comparison was also made with Professor 
Pearson’s empirical coefficient Q;* for a fourfold table : 


4abed N* 

(ad — be)? (a +d) (b +c)’ 

This formula is much quicker to use than the tetrachoric method, and Dr Brownlee 
kindly lent us his manuscript table of Q; for the values of log k*. The values of 
Q; found in Test III for the 28 total 7’s hardly differ at all from those of the 
tetrachorics even when the division is far from median. Even in the very skew 
material of Test II Q; and the tetrachoric + agree much more closely with each 
other than either does with the product moment, and this suggests that as far as 
these tests go, the extra time taken in finding the tetrachoric rather than the 
(Q); is not compensated for by extra accuracy, especially as in cases where we cannot 


; T 1 
Q; = sin ( = ) where k? = 
é 2 V1 < ke 








* Phil. Trans. A, 195, 1900, 
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Experimental Test of Errors in Correlation Coefficients 











Sub- 
scripts 
ofr 





| 


Product 
Moment 
‘79 +:°02 
61 +°03 
24 +°04 
‘17 +°04 
30 +:°04 
10 +°05 
11 +°05 
52 +°03 
16 +°04 
13 +°05 
18 +04 
‘O07 +°05 
13 +°05 
09 +°05 
OT +°05 
35 +04 
32 +04 
‘17 +°04 
91 +01 
‘31 +:04 
20 +°04 
21 +°04 
30 +°04 
19 +04 
21 +°04 
32 +04 
‘49 +-03 
36 +04 


"784+ 02 
“380 + 04 
609 + 03 
"188 + ‘05 
162+ °05 
"120+ ‘05 
088 + 05 
161 +°05 
117 +°05 
"262+ °04 
004+ °05 
008 + 05 
514+ ‘038 
078 + 05 
038 + 05 
“098 + °05 
O71 +°05 
008 + 05 
003 + ‘05 


~ 029 + 05 


221+ 04 
305 + 04 
"131 +05 
121+ 05 
‘901+ ‘01 
910+ ‘01 
‘909+ 01 
299 + 04 
‘290 + ‘04 





TABLE III. 


Number of Observations=216. 


Tetrachoric Qs 


‘786 +°04 “787 





533 +:°06 565 
‘207 +:°07 "209 
115 +:°07 115 
134 +°07 136 
053 +07 053 
043 4°07 “044 
435 +°07 “454 
125 +°07 "125 
028 +:°07 “029 
159 +°07 “160 
058 +:°07 “O59 
137 +:07 "139 
‘060 +°08 “062 
‘012 +°08 ‘013 
‘297 +:°07 *318 
‘290 +°07 “304 
“B00 +:°07 *323 
‘919 +02 "935 
‘231 +:°07 "236 
213 +°07 “214 
392 +07 “406 
226 +°07 *229 
258 +°07 *258 
‘322 +°07 328 
192 +07 "194 
616 +°05 ‘617 
331 +°07 *B35 
‘783 +:°04 
‘374 +°08 —_ 
533 4°07 — 
193 +°09 
183 +°09 
-"195 +°09 —_— 
O88 +°09 — 
128 +:°09 — 
— 030 +°09 
016 +:°09 — 
028 +°09 -- 
-*145 +°09 —- 
‘432 +°08 — 
033 +°09 — 
- "222 +°09 
095 +:°09 
“182 +:°09 a 
006 +°09 
— ‘0002 + ‘09 
-'110 +°09 
‘269 +:°09 —- 
256 +°09 
328 +°08 
270 +:09 — 
915 +°02 — 
‘920 +01 
923 +°'01 
‘224 +°09 
216 +°09 -- 





6r 6r’ 
Tetrachoric Qs 
— Product | — Product 
Moment Moment 
— 004 — 003 
— O77 — 045 
— 033 —°031 
— 055 — 055 
— 166 -— 164 
— 047 — 047 
— ‘067 — 066 
— ‘085 — 066 
— 035 — 035 
— 102 -"101 
— 021 - 020 
~ ‘012 — ‘011 
007 009 
— ‘030 — 028 
— 058 — ‘057 
— 053 — 032 
— 030 — 016 
130 153 
“009 025 
— ‘O79 — 074 
013 014 
182 196 
— 074 - ‘071 
068 068 
112 ‘118 
—*128 —'126 
126 "197 
— 029 — 025 
~-001 a 
— ‘007 _- 
— 076 
021 
— ‘O75 
“000 
— 033 
--°147 — 
— ‘246 — 
024 
—°153 
— 083 
— 045 — 
~ "184 
193 2 
‘lll — 
— ‘002 — 
— ‘003 = 
— ‘081 
048 
— 049 “ 
"197 - 
“149 - 
‘014 
‘O10 — 
“O15 — 
— ‘075 seams 
— ‘074 _ 








Tetrachoric 
100 ér 


| 
Ee. 
ooo 


-~I 


| 
bo 


DD Hrs BO WW & Os 
NNO kK COW 


obo © 














Sub- 
scripts 
of r 


47.6 


48.2 
48.3 
56.3 
57.6 
58.3 
58.2 
68.1 
68.3 


12.45 
13.45 
14.23 
14.65 
15.36 
16.23 
17.65 
18.36 
23.45 
34.25 
36.12 
38.12 
38.25 
46.23 
46.35 
47.65 
48.25 
48.35 
58.36 
68.12 
68.35 


14.567 
15.368 
16.234 
23.451 
38.245 
48.356 
68.123 


Product 
Moment 


112+°05 
193 + 05 
"198 + °05 
*295 + 04 


104+ 05 


-202 + 05 
"196 + -05 
"482 + -03 
-466 + 03 


‘786 + 02 
610 + 03 
-200 + 04 
192 +04 
"134+ -05 
"165 + 05 
‘005 + *05 
“053 + 05 
D14 + 03 


013 + 05 
230+ “04 
126 + 05 
"123 + °05 
"302 + -04 
‘076 + 05 
“042 + 05 
036 + °05 
‘037 +05 
076+ 05 
493 + 03 
“435 4-04 





"192+ °04 


138 + 05 
1134-05 
070+ °05 
122 + 05 
004 + *05 
480 + “05 
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TABLE ITI (continued). 


Number of Observations =216. 


Tetrachoric 


“09 
‘08 
+°08 
+ 09 


5+ 09 


+ 08 
+ 08 
+06 


78+ -06 


74+ °04 


07 
+°09 
+09 
09 
+ 09 
“09 
+09 
‘08 
‘09 
‘09 
‘08 
‘08 
“O9 
09 
09 
“09 
‘09 
+09 
06 
+ -06 


‘O09 
‘09 
‘09 
+°09 
“08 
+:°09 
+°06 





Q; Tetrachoric 
» — Product 
Moment 











6r 


065 
188 
194 
062 
121 
"132 
125 
134 
112 


6r’ 


Qs 
— Product 
Moment 











Tetrachoric 
100 or 





1602 | 
132°0 
— 26°6 
— 67:9 
— 272°3 
—541°9 
526°8 
231°4 
24°3 
25°6 
33°0 
35°2 


— 225°2 











Pelvic Measurements 


1 = Intercrests 
2= Interspines 


3= Transverse 
4= Diagonal conjugate 
5= Obstetric conjugate 


6= Antero-posterior 


7 = Inter-tubers 


8= Post sagittal 





Values of 8; 
‘089 + :060 
‘103 + ‘088 
*122+°100 
*030 + ‘024 
043 + 035 
“051 + 040 
“O38 + *029 
700 + 499 


Values of p» 
3°01 + -269 
3°34 + °451 
3°34+ °499 
3-02 + 250 
3°02 + -256 
2°98 + *243 
2°93 + -219 
4 “90 + +. 
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Kaperimental Test of Errors in Correlation Coefficients 


use the product moment 7 we are not usually in a position to judge if the 
material is normal or not. 


The whole set of results is summarised in Table IV, grouped according to 
the size of the product moment r’s. There seems to be some tendency in these 
data for the deviations to be rather larger when the correlation coefficients are 


smaller. 


Groupings 


(a) Total 7’s 
‘00 to *29 
“30 to *59 
*60 to “99 
(b) 1st Partials 
00 to *29 
*30 to *59 
‘60 to ‘99 
(e) 2nd Partials 
“00 to -29 
*B0 to *59 
‘60 to “99 
(d) 3rd Partials 
‘00 to *29 
*B0 to 59 
*60 to “99 


Totals 


TABLE IV. 


Mean absolute ér Mean absolute 




















Number of Cases for Tetrachoric 6r’ for Qs T Mean ‘ 
Paes Mise etrachoric 
Qs | dr x 100 
_——| Test | Test | Test | Test | Test i 
Test Test Test Test | Test I II Ill | I III — oo 
I II Ill II | III | | Test Lil 
| 
— 16 — | 16 - — 061 ‘063 36°7 
8 -—- 9 - 9 | -035 — ‘O87 — ‘O78 24°5 
2 3 3 3 3 “034 ‘139 | -030 144 ‘024 4°7 
r 1 | 3 | — | — | 070 | 208 | ops | — | — 1755 
6 1 5 — 049 «= *182 | -077 a = 171 
2 l 5 — — 047 «| -*139 | 0238 | — eee 3°4 
2 — 15 a 1110 | — | 106 | — — 176°4 
ef ae : —}]— | 056 . yearns 
2 -- 2 alt Ma: *050 — 049 — -- 78 
| 
| 
4 6 — | — | 079 — | '139 — | — 10769 
— — 1 —_— | —|j— — | -096 ay — 20°0 
—_ _ —- |= | = aes | =e 30. pao om 
cacti | 
32 6 94 3 |} 2 | 











On the whole the results of these experimental tests are not such as to shake 
our confidence in partial coefficients derived from fourfold or biserial r’s, but, 
as has been said before, these tests cannot go further than to indicate the average 
order of the deviations to be expected from material of the degree of skewness 
investigated, and the practical importance or otherwise of these deviations must 
be judged by the accuracy required and otherwise obtainable in each individual 


case. 


The following method of attacking the problem theoretically has been suggested 
by Dr Isserlis. It consists in examining the function representing the relative 
increment in a partial coefficient corresponding to given small relative increments 
in the total coefficients to see if this function has any maximum value or values 
within the possible range, and if so, what such values are. 


Consider the algebraic function 7... = 


population or samples but for all possible values of 7, 73, 7; between 0 and + 1 


Ti2—Ti31's3 ; 
a = _; not for any particular 
Vl —neVv1—rng 
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for which ryp3, 723, and 73,5 are also between these limits. Suppose that small 
increments hy, M43, hes (which for the time beiug we will consider as arbitrary, 
examining later our assumptions with reference to the present problem) are 
given to ry», 7; and 7; and that h,.; is the corresponding increment in r,;. Then 
to Ist order terms only : 





1 es — Tish is 13 — M223 
Ina {loa — Ins = "HTE hy" "HEL, 


a ‘ Pe | — ots ao es 
= r3°)" (1 —r.°)? — Tis 1—1rz, 
or, taking the relative increment, which is of more practical interest: 


hyo. = his Tyo his (rs — i372) he; (713 — T2723) 








Pxs (1 — Fast) (Fas — Tula, ot Tes (1 — r2;*) (12 — Tis? 23) 


fs hy Re —_ 
=, —, = to be constant and differ- 

Ye Nis Te 

entiate expression (2) to find its maxima we reach the result that the only values 

for which (2) could be a maximum or minimum are either 


: . 
M23 Tie Tie — T3123 s+» (2). 


If we suppose the relative increments 


Me = isles, 
or Vis = Np = 9, 
or lx = Ty = 9, 
or Ny = Ny =x = 0. 


In each of these cases r,,.,= 0 so that the relative error is of no interest. The 
conclusion therefore is that for all the r’s within the range 0 to +1 the relative 
error (so far as it is given by the approximate expression (2)) has no maximum 
value. The same conclusion is found to be true of the absolute error, if in this 
case we suppose the absolute increments h,,, hs, ha, to be constant. The function 


Tie — Tis es 
V2.3 


= ae; arms may however be discontinuous for certain values of the r’s. 
—nyeV1—7? 


When either 7,, or 7. is unity, then r,. is equal to either r,; or r,, as the case may 
be and the function ——"— ruts — is indeterminate. Values of either or both 
v1 a Ns" v¥i— Mos" 
of the two secondary correlations 7, or 7, approaching unity call therefore for 
fuller consideration. Before going into this case, we will first see how far we are 
justified in applying the above conclusions to our present problem. The 7’s are 
product moment 7’s and the h’s we take to be the differences between the values 
of r as found by the product moment and—say—the tetrachoric methods; they 
are not to be confused with either sampling errors, or with the differential 
increments which we give to the 7’s in the process of finding a maximum. Our 
experimental results seem to justify the assumption that the h’s are small com- 
pared to the 7’s except when the r’s themselves are very small. With regard to 
our other assumptions, it may be objected that we cannot vary the r’s without 
varying the h’s too, and that correlational relations exist between the h’s, so that 
we have no right to assume either 5 ete. or hy ete. constant. This would, of 
12 


course, be perfectly true if our universe or field of variation were confined to 
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sumples from one single population and if the /’s were sampling errors in either 
product moment or tetrachoric r’s, and it might also be true in the same confined 
universe of samples from a single population if the /’s were given their present 
meaning of differences between tetrachoric and product moment values (we are 
not able to investigate this theoretically without a knowledge of the frequency 
distributions of the original variables in the population, and our experimental 
results do not cover samples from the same population). The point however does 
not, I think, arise here, since the universe we are considering is not limited to 
any particular population and its samples, but comprises all possible populations 
of all degrees of skewness and of correlation likely to arise in practice, with their 


samples. Hence to any particular set of + values can correspond any number of 


different values of the h’s, these values depending, as we have seen, mainly on the 
deviations from normality in all the different original distributions rather than 
on the values of the r’s. And vice versa, to any particular set of h values can 
correspond any number of different values of the r’s depending on the actual 
correlation between the variables, which can take all values from 0 to + 1. Hence 
it seems not unreasonable to assume that the expression (1) above is a continuous 
function of the r’s in the still wide universe limited only by picking out a definite 
arbitrary constant set of values of h,, h,; and h,,, or that (2) is a continuous 
function of the 7’s in another universe limited only by picking out definite 
arbitrary constant values of foe = ~ 
is "13 "2 
The fundamental question we ask is therefore—Given certain relative or 
absolute values of the deviations h, are there any particularly dangerous sets of r 
to which these can be applied ?—dangerous, that is, as regards the size of the 
resulting relative or absolute deviation in r,,.;.. The answer 
one 





only an approximate 
is that so far as regards the value of a quantity which (except near the 
limits 0 and + 1 of the total r’s) closely approximates to the deviation in question, 
there are no such dangerous values; hence within these limits we can safely say 
there are no particularly dangerous ones for the true deviations. We now go back 
to the cases when either 7, or 7, approaches unity. Numerical illustrations of the 
large fluctuations in partial coefficients which can be obtained by giving small 
arbitrary deviations to the total coefficients in these cases are given in Table V. 
The possibility of such deviations will then be considered. 





This table also illustrates the fact, obvious enough @ priori, that 7, is not 
unreliable when r,, alone is approximately equal to unity, i.e. to get instability 
the high total correlation must involve the variable kept constant. The deviations 
in the total coefficients in Table V are all arbitrary and were chosen to give large 
deviations in the partials, but they are mostly of about the order of the sampling 
errors of the respective totals. These considerations led me at first sight into the 
error of altogether mistrusting partial coefficients derived from such high totals 
even in the case of product moment 7’s. That this suspicion of entire unreliability 
is however not well founded in the case of sampling errors of product moment +’s 
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TABLE V. 


Arbitrary Deviations. 





| re ee 








i | . Partials } | awed . 
i - or | Partials co or ny | 2 ti dar | 
| | (arbitrary) | from r oa br (Partial) : x i oe sa 
| | lie andl Ie ie ia a = eae ca) (ada ac oe 
| fv» 96 | 03 | — 009 “717 726) 
| (1) 471s | 97 | O01 | 496 | --184 | -680! | 100| 282 10-6 
| Lies | ‘99 | -003 | 863 ‘12 | — 051) 
| (7 "40 “05 — ‘231 “418 648) | 100 31:3 10-7 
| (2) 471s “45 ~ +05 ‘318 | —°361 | --647} 
lr: 98 ‘Ol ‘997 “990 013) | 200 | 62°7 10-38 
| | \ 
| (re , 571600 + 083 0684. | — ‘473 "112 585 | 
(3) j7is | 9513134012 | - -011313 943 897 | —-046). 30 | 28°5 10-6 
| Urey | 7085064 -061 | --058506/ 651 | “185 | —-466) 
| 
(72 809521 +031 | — 029521 044 | --'347 — 391) 
(4) 473 | 964803 +4 -006 ‘005197 "894 ‘931 ‘037 | 55 | 681) -080 
| Livy | “832386 + 028 ‘017614 333 614 281 | 
|g, [712 | 888242 + 027 ‘026758 | — °630 5081135) 
(5) 4 ry; | 9956264001 | — 001626! -988 974 |- O14} 30/179 | 107-38 
ro, | 911441 +°021 | — 021441 732 —-321 |-1053) 
_ {72 | *970+ 003 003 "825 ‘878 053) 
| (6) irs | *906+ 909 | —-02 202 | —-276 | —-478. | 163 | 29 10-6 
1” Lng | 914 009 02 339 | -672 | -333) | 





Sources of Duta, 

(1) and (2) fictitious. 

(3) p. 21, (4), p. 22 and (5) p. 29 of E. C. Snow, M.A., “The Intensity of Natural Selection 
in Man,” Drapers’ Co. Research Memoirs, Studies in Natioual Deterioration, pp. 1—43. 

(6) Greenwood and Newbold, Journal of Hygiene, Vol. xxt. No. 4, August 1923, Table IV, 
p. 444. 
is seen when the correlation in errors between total coefficients is taken into 
account, and it was found that though any one of the above deviations in the 
total 7’s is not unlikely to arise as an error of sampling, in every case the com- 
bination of the three would be a most unlikely event. The last column of Table V 
gives a rough approximation* to the probability in each case of a set of deviations 
in the three total r’s equally or more improbable than the arbitrary ones given in 
the second column. These probabilities are negligibly small in every case but 
one, and even in that case though P is ‘080, N is only 55, so that the deviations of 
Ve; and 7, are not more than 2°88 and 2°32 times their respective standard 
deviations. These few cases cannot of course prove that the large deviations in 
the partials would always be found to be so unlikely, but any other result would 

* This approximation has been made by assuming a normal correlation solid in fourfold space for the 
distribution of errors in the three total coefficients, and finding x? and P in the usual way. The assump- 
tion of normality is of course not justified for such high values of 7, but in the absence of the requisite 


knowledge for dealing with the appropriate skew solid, this method may serve as a rough guide. 
It probably underestimates the value of P, but the values found leave room for this. 
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be inconsistent with the general theorem proved by Yule* that the distribution ; 
of sampling errors of a partial coefficient is the same as that of a total coefficient. 
The demonstration of this theorem (which Fisher+ has since emended by showing 
that in the case of the partial distribution n should be replaced by n — s, where s is 
the number of variables kept constant) is quite general and should apply just. as 
well to the case where the total coefficients are high as to any other. In fact, 
this case has been tested experimentally by Bisphamf for samples of 30, in which 
the secondary total correlations were as high as ‘977 and the mean partial 
coefficient approximately zero; he found that his observed distributions of partial 
coefficients gave a reasonable agreement with the predicted values based on the 
cooperative paper from the Galton Laboratory§ on total correlation coefficients. 


TABLE VI. 


Observed Deviations due to Tetrachories. 





: or Partials Partials Approxi- 
ae | r Tetrachoric from aon . x i ' mate 
Tetrachoric — Product Product : * (Partial) | ~ x order of 
Moment | Tetrachorics } P 
| | 


Moment Moments 


v2) 933582 + 006 | 9622504 012 | 028668 | 5694-032 -299+4-087* —-27 ) | 
(1) ra| ‘900105 + 009 | ‘9652024 -011 | -065097 | ‘8074-017 _°739+ 043 —-068) | 200/314 10-7 
Pog |—-022+-048 3994-080 _-421) | 


| | 
| 


| | 
| 
966318 + 003 | (9813564 -007 | | -015038 | 
| | 
7\2\ ‘969810 + 003 | 953441 +015 |— 016369 | °825+ 017 665 +°059 + —°160 | 
| 
| 
| 








(2) rg , 9063604 -009 | -919303+-022 | 012943 | -202+-051 | --225+-100 |—-427} |163| 90-7) 10-% 
ro! | 9139214 009 | -978574 + 008 | 064836 | 3394-047 | 8634-027 524) | 
| | 

















* See footnote on p. 255. 


Values of B, and p,. 


By By 
(1) Deaths age 0—1 “i x, 2°697 5121 
Deaths age 1—2__..... @, 4°224 7040 
Deaths all ages Soy a3 6°758 L197 
(2) Oxygen intake aha a, 0255 2°282 
Carbon dioxide output ws, ‘O103 2°185 
Work per minute... wy, ‘4078 2°283 


For true sampling errors then, there appears to be no danger in high total 
coefficients. Discussion as to what kinds of deviations can be regarded as true 
sampling errors is outside the scope of this paper, but it is clear that the deviations 
arising from the use of tetrachoric instead of product-moment r’s are not 
sampling errors; hence in the case of tetrachoric partials there is no reason why 
* Proc. Roy. Soc. A, Lxx1x. pp. 182—193, 1907. ‘On the Theory of Correlation for any number 
of Variables treated by a new System of Notation,” by G. Udny Yule. 
+ Metron, Vol. m1. No. 3, 1924, pp. 329—333. “The Distribution of the partial Correlation 
Coefficient,” by R. A. Fisher. 
t+ Metron, Vol. 1. No. 4, 1923, pp. 684—696, ‘‘An experimental Determination of the Distribution 
of the partial Correlation Coefficient in Samples of thirty,” by J. W. Bispham. j 
§ Biometrika, Vol. xt. pp. 328—413. ‘*On the Distribution of the Correlation Coefticient in small 
Samples.” A Cooperative Study by H. E. Soper, A. W. Young, B. M. Cave, A. Lee and K. Pearson. 
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deviations from product moment partials as large as the fictitious ones in Table V 
should not arise. That they do in fact sometimes arise when high totals are used 
is shown by the two sets of actual observations given in Table VI. 

In the first set all three variables are very skew, in the second set one is 
moderately skew but none is normal. The first set consists of the male deaths 
ages 0—1 years, 1—2 years, and all ages respectively in the 10 years 1901—1910 
in the first 200 Registration Districts of England as given in the Decennial 
Supplement, omitting those in which the total male deaths exceeded 9500; the 
second set consists of the 163 observations of oxvgen intake, carbon dioxide 
output and work done per minute cited above in the last example in Table V. 
Three out of the six partials have deviations of -4 to +5. The deviations from 
product moment values in the total coefficients clearly do not (and there is no 
apparent reason why they should) follow the same correlated distributions as 
sampling errors, since the approximate probabilities of the observed or more 
unlikely sets arising as errors of sampling are negligibly small (see last column of 
Table VI). The danger of getting quite unreliable partials from tetrachoric or 
other approximate methods in cases where the secondary totals are high does 
therefore exist, but it is not however likely to arise often in practice owing to the 
comparative rarity of these very high correlations. 


Summary. 


(1) Experimental tests have been made on three sets of material of varying 
degrees of abnormality of the deviations from product moment partials, likely to 
arise in partials derived from total coefficients found by other methods. 

(2) The average order of the deviations found is summarised in Table IV. 

(3) In the particular samples examined the deviations, though in some cases 
large, are not of practical importance, compared with the sampling errors, 

(4) Search for maximum values of the deviations in partials by the usual 
algebraic method fails to reveal any particularly dangerous combinations of total 
coefficients except very high values of secondary totals, and the unreliability of 
tetrachoric partials in such cases is confirmed by tests on two further sets of 
observations. 
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Note on Miss Newbold’s Memoir 


Note on Miss Newbold’s Memoir. 


The previous paper has been published because it raises a number of most 
interesting points and has involved clearly a very large amount of computing work. 
But the Editor considers and he thinks the author would admit that it cannot 
be looked upon as a final treatment of the topic. It is not only (a) that excessively 
anormal distributions have been used as test material and(b)that these are in several 
cases obtained by very artificial truncations, destroying any approach to linearity, 
but that the theory which makes h,,/72, Ay3/713, hos /72, constants is quite inadequate, 
even for errors of observation as apart from deviations of sampling. Innumerable 
quantitative tables each with its own product-moment value of r will give the 
same tetrachoric r, and a triply infinite series of product-moment partials will 
give the same tetrachoric partial r. The first steps that are needed in analysis are 
(i) to determine the sampling distribution of a total tetrachoric r and (ii) to find 
the correlation between total product-moment r and total tetrachoric r. These 
problems*do not seem beyond the present state of analysis. We can then possibly 
proceed to the like problems for partial tetrachorics. Meanwhile the Editor ventures 
to think that a table like Table VI is very liable to misinterpretation. The distri- 
butions of product-moment r from ‘90 to ‘98 etc, are known, and the probable errors 
provided in the table do not really represent them. Still less is this true for the 
tetrachoric totals. What may be the distribution of differences between product- 
moment and tetrachoric r’s is a quite unsolved problem. 

In practical statistics so much work has been done that we know fairly closely 
now what is the nature of the distributions for a great variety of characters. We 
know that deaths with age, births with time, weights with age, population, birth- 
rates and death-rates do not give normal distributions or linear regressions. There 
is another wide variety of characters which we know do fulfil these conditions. 
When we come to deal practically with statistical data even when we know the 
class it is always advisable to study correlation by two (or more, if possible) different 
methods, and only speak with some confidence of the results if they are confirmatory. 
It is very rarely that some forms of the correlation ratio, of mean square contingency, 
of biserial r, etc. cannot be used in conjunction, so that the investigator has some 
confirmatory evidence of the extent of the association. This remark seems to be 
called for by the statement on p. 252, that “in a practical case...we have no means 
of knowing the distribution and may be dealing with a very skew surface.” 

One last remark, the total tetrachoric coefficient assumes normality in the popu- 
lation sampled. The sample to which we apply the product-moment method is not 
necessarily normal even if we sample from a normal population. If we take a four- 
fold sample from a normal population, we can shuffle and rufjle the contents of each 
of the four quadrants in a great variety of ways, so that the normality of the original 
population is disguised from the standpoint of the product-moment sample, while 
preserved in the tetrachoric. Such “ruffling,” especially in the case of extreme 
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observations due to eccentricities of observation, or errors of nomenclature, or pre- 
judices of the market (in economic matters) or idiosyncrasies of legislators or judges 
(in criminological data), may render the tetrachoric correlation and its partials, not- 
withstanding higher probable errors, more reliable than product-moment 7’s. It is 
precisely the same conception as that involved in the statement that a median 
estimate of judgments by a mixed group of jurors is more reliable than their 
average. This remark has been made because a careless reader of Miss Newbold’s 
paper might conceive that product-moment correlation has some prescriptive pre- 
eminence, that position can only be given to the correlation ratio and not to the 
correlation coefficient. Correlation analysis is a name covering an extensive “instru- 
mentarium” and only a wide experience can really enable the statistical worker 
to determine what tool to use under given circumstances; it will depend not only 
on the grouping of the data, but on what is often more important the class of data 
to which they belong and the nature of the recorders. K.P 











THE FIFTEEN CONSTANT BIVARIATE 
FREQUENCY SURFACE. 


By KARL PEARSON. 


(1) Introductory. At the conclusion of a paper “On a certain Hypergeometrical 
Series and its Representation by Continuous Frequency Surfaces” (Biometrika, 
Vol. xvi. p. 185), I suggested that we should not get really satisfactory skew- 
frequency surfaces until we had enough constants at our disposal to determine 
independently all moments up to the fourth order. The hypergeometrical series 
with double variation considered in that memoir contained—apart from a constant 
determined by the volume—only four independent constants, which were thus not 
enough to give absolute freedom to the correlation 7, and the marginal constants 
B,, B, and B,’, 8,’ of the two variates*, 


In order to give freedom to r, B,, Bo, B:, 8: it was needful to still further 
generalise the double hypergeometrical series. This I did in a very simple manner. 
I supposed a population of NV individuals, m of them being marked. I then drew 
a sample of n individuals and did not return them to the bag. I now placed in 
the bag m’ more marked individuals and drew a sample of n’. If z,y be the fre- 
quency of occurrence of samples of s marked individuals in the first sample of n 
and s’ marked individuals in the second sample of n’, then zs determined a double 
hypergeometrical series with five constants V, m, m’, nand n’ which can be chosen 
so as to give the r, 8,, Bs, B,’, 8.’ of any system of observation. 

I failed, however, to obtain integrable differential equations to the surface 
“parallel” to the spacial histogram thus determined, just as I failed many years 
ago to integrate the equations resulting from the simpler double hypergeometrical. 
In despair of achieving any result in this way, I applied—as Laplace had done to 
the simple single hypergeometrical series—Stirling’s Theorem, and evaluated 2,, 
in a similar manner. The result was a simple normal bivariate surface of which 
the constants had their usual values in 7, o, and o., but the ordinate of this normal 
surface was multiplied by a polynomial with linear and cubic approximative terms, 





I then realised that if 1 took my skew-frequency surface to be 


a (=,- 2244 ¥) 


“S35 alae ee a Y 2 A) P Fd 
. 1 _ 71 ba %2 \! a amy + Cy - + Us Y + b, 9 + 2b, y + b, J > Cy ¥ 
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+ C, “yy + ¢,— 2 +6, T +d, rie d., I 4 3d. nt - +d, J = + d. y 
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* The position of the double mean and the values of the standard deviations are not here included 
in the constants, as the first is a. question of how the series is imposed on the observations and the latter 
are really only questions of scale in plotting. 
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I should have enough constants*, namely 1, 2, 1, dy, G1, M2, b,, bs, bs, C1, Co, Cs, Ce 
d,, d., d;, d,, d;, or 18 constants in all to determine: 

(i) the total volume; (ii) and (iii) the position of the mean; (iv) and (v) the 
two variate standard deviations ; (vi) the coefficient of correlation ; (vii), (viii), (ix) 
and (x) the four marginal §’s, 8,, Bs, By, B:’; (xi) and (xii) the two third order 
product moment coefficients about the mean; and (xiii), (xiv), (xv) the three 
fourth order product moment coefficients. This would leave me three constants to 
spare, and I was therefore at liberty to make the constants I had called r, o, and o, 
the true correlation and the true standard deviations of my two variates. There 
would be no difficulty in determining my fifteen constants a, a, do, b,, bs, bs, C1, Ce, 
C3, Cy, d,, dz, ds, d, and d; for the momental and product momental identities would 
in every case give linear equations, which would only require straightforward if 
very laborious algebra for their solution +. 

Such is the history of my attempt to find a bivariate frequency surface which 
gives all the first fifteen momental constants of any bivariate set of observations. 
It is, perhaps, needless to observe that I am not over-pleased with the result. 
I do not lay great stress on the fact that besides the labour of computing the 
fifteen first momental constants of a set of observations, there will be the great 
labour of investigating the contour lines. All such labour comes in the day’s work 
of the statistician, and as he certainly cannot express the univariate frequency 
distribution by less than four momental constants, so he cannot expect to be let 
off a bivariate distribution with less than fifteen. My main criticism of the surface 
(i) is that although we may have observations and surface in agreement for the 
fifteen constants, there is no evidence that this method of expansion provides a 
series in any way convergent. It certainly must fail for the crateroid and J-sectional 
surfaces, which we have reached in practical statistics. There are many other 
surfaces also which are theoretically limited by lines or triangles, and which it is 
impossible to represent by (i) although they would have the same fifteen moments 
as (i). 

But the full investigation of (i) will serve to throw useful light on the 
legitimacy of expansions of this sort, which have recently been suggested from 
more than one quarter. This matter we shall discuss as the surface is developed. 

(2) Determination of the Constants of Equation (i). I shall adopt the following 
notation. If 

Psy = Sngy (x — x) (y — yf /N, 
then I shall write gy = psy/(ox' os" ). 

Here & and ¥ are the means, o, and o, the standard deviations of the means of 
the variates # and y of the total population J, n,, is the frequency of the pair a, y, 
= zlxdy when we pass to continuous variation. We shall choose z, to be given by 

ym N/(2er9y a4 V1 — 1°) ..0....0seccecsecevevcoees sae 


* z, is not an independent constant and may be chosen at will. 
¢ The algebraic theory of the 15-constant surface with its marginal totals and regression lines was 
given in the Lent Term of this year (1925) in academic lectures. 
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For a normal bivariate distribution we have* : 

Jor = Yo = 9, Jor = Yn = 1, Yor = Yao = 9, Jos = Qn = 3, Jos = Yo = 9, 

Yoo = Jo =15, Yor = ro = 9, Jos = Yo = 105, Qu=", Qi = Yn = 9, 

hs = {a = 3r,  Gs= a =9, Qs = Aa = lr, d6=da= 0, Qi = In = 105r, 

Qu =1+ 27°, qos = Gao = 9, es = Ya = 3 (1 + 47°), qos = Ysx = 9, 

You = Yoo = 15 (1 + 67"), gp = 37 (3 +27"), Gu =Ge=9, = Fu = 15r (3 + 47°), 

du = 3 (3 + 240? + 87") .......00.5. sian boy queesVetenebncyinsasesecemss pseees eeaeegeaes (iii). 

These are the values that must be used to obtain the integrals, when we 
integrate the right-hand side of Equation (i). But as we shall never need to use 
these symbols g on the right-hand side of our results, but always put in their 
values in terms of the correlation coefficient r, we may reserve the symbols q for 
the observations themselves, only emphasising that they will not then be restricted 
to any special values as in the case of normal correlation. 


We first note that if we multiply (i) by an odd power or product in z and y, 
we shall obtain only an equation involving a), dy, ¢, C2, Cs, ¢,. On the other hand, 
if we multiply by an even power or product in w and y, we shall obtain an equation 
involving ay, b,, b, bs, dy, ds, ds, d,, ds. 


We next note two further points: (i) that the origin is taken to be the mean, 
7 ‘ ar ; ‘ a\? /y\* ,(2x a 
(ii) that in multiplying to integrate we shall multiply by (=) (2) d (=) d (*) 

Gi/ \G» a) G2 

(varying s and s’), and always of course integrate between the limits — 2 to + 
of both variates, although in practice we may have absolute knowledge that « and y 
can only cover a much more limited range of values}. In such cases we suppose 
the observed frequency zero without this range, but we still equate the frequency 
constants for a theoretical surface of infinite spread to the observed constants over 
a limited spread. We cannot therefore anticipate good results from such a method, 
when the limits of the observational spread are just the loci where the frequency 
is most emphasised, e.g. cloudiness at two stations, or vision as tested in the usual 
way by Snellen’s types of right and left eyest. There is therefore considerable 
opening for criticism even in the limits chosen for the integrations. 

Multiply (i) by a/c, and integrate, we have by (iii): 

‘ee sie x x ¥y T x { € € 2 € 

z—d—d*=N—=0=N {a+ ra, + 3¢, + 3re, + (1 + 27°) ¢; + 3re,}. 

-wol-w» GF GD, Gy a; 

* Biometrika, Vol. xu. pp. 86—87. 

+ For example, correlation between the numbers of cards of the same suit in two players’ hands: the 
possible range is the right-angled triangle of sides 0 to 13 and 0 to 13. 

t Another good illustration is that of the percentage errors made in bisecting a line. Clearly in 
measuring an individual’s accuracy, we can only take a positive percentage error, for the individual 
who makes a negative error is as inaccurate as one who makes a positive error, and we cannot treat the 
population as a whole as absolutely accurate. Our percentage errors are therefore limited to the range 
0 to 50. The maximum frequency is close to zero, or we get a J-shaped distribution. If we now measure 
the error in trisecting a line the range of percentage error is limited in a singular manner, which I will 
not enter into here, but there is again a lumping up of frequency in the region of zero and we get for 


the correlation surface of relation between bisectional and trisectional accuracy in the same individual a 
J-sectioned frequency surface. 
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Hence: 0 =a, + rd, + 3c, + Brey + (1 + 2r*) cy + Brey oe... cece eee (iv). 
Similarly for the y mean: 
0 = ra, + dy + Bre, + (1 + 2r*) Co + Brey + Beg... cee cee eee ee (v). 
We now reach in the same manner two equations by multiplying by (#/o,)* and 
by (y/o,)*, and integrating. We have: 
VB, = 3a, + 8ra, + 15¢, + 15re, + 3 (1 + 49°) c, + Br (3 + 27) ce, ...(vi), 
V By = 8ra, + 8a, + 3r (3 + 27°) c, + 3 (1 + 47") c, + Lire, + 15e, ...(vii). 
Here VB, and VB, must be given the same signs as ,#; and ,;, the third 


moment coefficients of the « and y marginal totals. 


Lastly we multiply by (=) 7 and * ( y\" respectively, and thus obtain the 


g 
/ Ge O \G2/ 
values of g., and qq. for the observed data. We find : 


eaeeusont (vill), 
dhe = (1 + 27°) a, + Bra, + 3 (1 + 42°) c, + 37 (3 + 27°) c + 3 (1 + 42°) c, + L5re, 
Sein Ranal (ix). 


The six linear equations (iv) to (ix) suffice to determine the six coefficients a,, 
Mz, C,, C2, C3, C, of the odd powers in the polynomial of Equation (i). We need not 
detain the reader with the steps in the solution, but place here the results : 


1 Py 
— 2 — ry {37qa — VB, — (1+ 2r*) got+r Ve,  Speerorrernene sen (x), 
1 : = 4 i ; 
aA, = 2(—Pryp {3rque _ VB, _- (1 + 2r*) qu +r VB;} in nnes beeencaseee (xi), 
1 (2/9 ’ € ) oe 
a % a—rp {9° (Bqie — T VB, ) — (BrGa — VB1)} 0s eeeee essen eee eens (xn), 
C= a 73); (1 + 27°) qa — 7 VB, — 7 (242°) Qo—r v By} ...(xiil), 
1 —— ; . a - 
C= 5 a. {(1 + 27°) qu — r VB/ — 7 (2 + 7°) gu —7rVvB,)} -.-(xiv), 
1 ae 
1= 2 (Bq, — 7 VB,) — (Brqie— VBy)} .o.eceecececsceeeeees Xv). 
C4 6 ( — ry {7 (3x ? VB,) (8rque V By )} (xv) 


For checking numerical work, it may be of service to remark that: 
a, = — 3¢,— 2rce, — Cc; and dy= — Cg — 2Cy — Bey ..... 2000s (xvi). 


We now proceed to the even moments and products. Integrating (i) as it 
stands, we have: 


N=N(1—a, +b, + 2rb, + b, + 3d, + 3rd, + 3 (1 + 2r*) d; + 3rd, + 3d;), 
or ay = b, + 2rb, + b; + 3d, + 3rd, + 3 (1 + 27°) d; + 3rd, + 3d, ...(xvii). 


This gives us a, in terms of the eight even constants. 
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The following equations are obtained by multiplying by 2*/o,? and y*/o,%, and 
integrating : 


dy = 3b, + Grb, + (1 + 2r*) b, + 15d, + 15rd, + 9 (1 + 47°) d; + 3r (3 + 2r*) d, 


+3 (1 + 47") d;...... (xviii), 
dy = (1 + 27°) b, + Grd, + 3b, + 3 (1 + 47°) d, + 3r (3 + 27?) d, + 9 (1 + 47°) d; 
+15rd,+15d, ...... (xix). 


Multiplying by «y/(o,0.), we deduce : 
ray = 3rb, + 2 (1 + 27’) b, + 8rb, + 15rd, + 3 (1 + 47°) d, + 9r (3 + 2r*) d, 
+3 (1 + 40°) d, + 15rd, ......(xx). 
The fourth moments of the marginal totals provide the equations: 
By —3 = — Bay + 15d, + 30rd, + 3 (1 + 472) by + 105d, + 105rd, + 45 (1 + 67%) d, 


+ 15r (3 + 4r’) dy + 3 (3 + 247? + Br") d;...... (xxi), 
8B. — 3 =— 3a, + 3 (1 + 47°) b, + 30rd, + 15b, + 3 (3 + 247° + Br*) d, 
+ Lr (3 + 4r*) dy + 45 (1 + 67°) dy + 105rd, + 105d, ......(xxii). 


We have now to obtain the fourth order product moments by multiplying by 


wey xy? zy , : : 3 
y Y and 2 4 ; respectively, and integrating. We deduce: 
Oo, 0% 


a; o, oo 
Yn — 8r = — Bray, + 1L5rb, + 6 (1 + 47°) b, + 3r (3 + 2r?) b, + 105rd, + 15 (1 + Gr?) d, 

+ 45r (3 + 47”) d, +3 (3 + 247? + 8rt) d, + 15r (3 + 47°) d...... (xxiii), 
Goo — 1 -- 27? = — (1 + 27) ay + 3 (1 + 47°) b, + Gr (3 + 27?) b, + 3 (1 + 47°) D, 

+ 15 (1 + 67") d, + 15r (3 + 40°) d, + 9 (3 + 247? + 87+) d, 

+ Lr (3 + 47°) dy + 15 (1 + Gr") dy ........ceccccccccccscccccenccees (xxiv), 
is — 87 = — Bra, + 3r (3 + 2r*) b, + 6 (1 + 40°) b, + 15rd, + 15r (3 + 40°) d, 

+3 (3 + 240° + 8r*) d, + 45r (3 + 47°) d, + 15 (1 + 6r*) d, + 105rd, 


By aid of (xvii) a, was eliminated from Equations (xviii)—(xxv), and these 
equations were then rewritten with the following abbreviations : 
Bb, saa aa (B, = 3), by = a (B. 3), 


. : \ (xxv) 
Qu = 4 (qa- 37), Qu = $ (qs — 3r), Qe = Tz (Gzz — 1 — 2r*) 


There resulted the following system of equations, of which the first three pro- 
vided the b’s in terms of the d’s. These were substituted in the last five, which were 
then solved for the d’s. 

0 =b, + 2rb. + 1r°b, + 6d, + Grd, + 3 (1 + 5r*) d, + 3r (1 + 7°) d, + 6rd, ...(xxvii), 

O= rb, + 2rd, + b, + 6r*d, + 3r (1 + 7°) d, +3 (1 + 57?) d, + Grd, + 6d, ...(xxviii), 

0 = 2rb, +2 (1 + 17°) b, + 2rb, + 12rd, + 3 (1 + 37°) d, + 12r (2 + 7°) d, 

+3 (1 +3r*) d,+ 12rd,...(xxix), 
2B, =b, + 2rb, + rb, + 8d, + 8rd, + 3(1 + Tr) d, +r (3 + 5r*) d, + 2r? (3 + 7°) d, 


itanel (xxx), 
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2B, = 1°b, + rb, + b; + 21? (3 + r*) d, + 7 (3 + 57°) d, +3 (1 + 7°) d, + 8rd, + 8d; 


‘cn (xxx1), 
2Qu1 = 4rb, + 2 (1 + 3r*) b, + 2r (1 + 1°) b, + 82rd, + (5 + 27r’) d, + Gr (7 + 9r*) dy 
+(3 + 2177 + 874) d,+4r(345r*)d; ...... (xxxil), 


2s = 2r (1 + 7) b; + 2 (1 + 37°) b, + 4rd, + 47 (3 + 5r*) d, +(3 + 217? + Br) dy 
+ Gr (7 + 9r*) ds + (5 + 27r*) d, + 32rd, ......(xxxiii), 

6 Qo = (1 + 5r*) b, + 49 (2 + 9°) b. + (1 + 57’) b, + 6 (1 + 77°) d, + 8r (7 + 9r*) dy 
+ 6 (24+ 17r? + 5r*) d, + 3r (7 + 9r*) d, + 6 (1 + Tr’) ds...(xxxiv). 


The solutions of Equations (xxvii) to (xxxiv) and then (xvii) are as follows: 


3 ; 
t= — a—y (B, + Bs’ — 7 (Qs + Qis) + (1 + 27") Quo} sees eeceeeeeeeees ‘eeees(XXXV), 
3 ; 
b,=- a—r {2.B, + 2r°B,’— 27rQs —1r (1 + 7°) Qs + (1 + 52°) Quo} oo. (xxxvi), 
3 
2b, = a7 {47 (B, + B’) — (1 + 39?) (Qa + Qs) + 47 (2 + 7°) Qu} ......(xxxvii), 
3 ‘ 
b, = — ary {2r° B, + 2B,’ — r (1 + 7) Qn — 27rQis + (1 + 57*) Qe}... (xxxviil), 
1 , 3 9 > 
d,= (iy { By + 1 By — 1Qan — 77 Qis + B12 Qys} ......ccscccsceecsccccsceecece (xxxix), 
d.=— ai 2 y {4rB, + 4r° By — (1 + 37) Qu — 7° (3 + 7°) Qi; + 6r (1 + 7°) Que} ...(x1), 
1 , i . 
d, = a—ry {20° (B, + By) —r (1 + 7°) (Qu + Qis) + (1 + 407 + 14) Qa} oe... (xli), 
d,=-— a . i {47° B+ 4rB,’ — 1° (3 + 7°) Qn — (1 + 37°) Qi, + 6r (1 + 7°) Qu} ...(xlii), 
1 
= Sy4 SR, — Fg rN oo neisecnscecsvcsecccsnnseewnaed xiii 
d; a- (74 Bo + By — 7° Qa — 7Qis + 37? Qe} (xiii). 


Equations (x} to (xv) and (xxxv) to (xlili) are the complete formal solution of 
our problem ; they enable us, as soon as the constants 


ry 7 , 
N, oi, Or, VB, VB, ’ Bo, BY, Yr, Qai> hes si» ha> 22 
and the means # and # are known for our data, to write down a surface, with the 


whole of these fifteen constants—i.e. the complete set of constants up to moments 
of the fourth degree identically the same as those of our given data. 


(3) Transformation of the Equation (i) into a Second Form. If 2 be the function: 
ae: x? Qraey js y? 
7. N e 21-97 \o,2 o,0, 0,7 
2ro,0,Vl—r* 


and we write # =a/o,, y'=y/o,, it is perfectly easy to write down the various 
differential coefficients : 


ds+s' Z 


da*dy** 
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These differential coefficients are, or at least are proportional to, the generalised 
tetrachoric functions for two correlated variables. Now if we form all the differential 
coefficients up to the fourth order, it is easy to arrange our surface in the form 


ee fee wee eee — dz dz az 
o> 6 (v2, dx’ + 3qa da’*dy’ + 342 dix’ dy’? + VB, a) = Bb, da’ + Os: dx’*dy/ 
, dz . OF , OS 
+ BQeo da!*dy/’? + Qs da’ dy’* + 2 dy’ eee eeees (xlv). 


We now begin to realise some of the defects of the proposed surface. Why 
should we not introduce the six fifth order bivariate tetrachoric functions? These 
would enable us to state a surface which would have all the 21 momental coeffi- 
cients up to and including the fifth order coefficients, and the same process might 
be continued ad infinitum. 

The only defence for not doing so would lie in a demonstration that the cubic 
terms are sensibly greater than the fourth order terms. In the data with which 
we have practically to deal, the terms B,, Qn, 3Q», Q,, and B, are not only not very 
much smaller than }V8,, $qu, $qz2 and 4V@,’, but are generally of the same order, 
and often, in the case of approximately symmetrical surfaces, of a much higher 
order. If there be no convergency in the series, the idea that the cubic terms will 
suffice fails completely; it is equivalent to assuming that fourth order terms are 
essentially normal (i.e. 8,=3, 8, =3, ga = qs = 37, and qx» = 1+ 2r*), but the third 
order terms are not. Actual frequency distributions diverge just as much in their 
fourth order terms from normality as in their third order terms, i.e. we are no more 
likely to have B, = B. =3 or qo =1+4 2r’, than VB, = Ve =0 or qu = 0. 

Another important criticism will be obvious on examination of our Equations 
(x) to (xv) and (xxxv) to (xliii); it will be seen that the constants of the poly- 
nomial proceed by increasing powers of the inverse of (1—7r*). Hence there is an 
actual source of divergence in the series, especially potent when r is large. We 
have no right to drop out a, and retain a, and a, to drop out the quadratic terms 
b,, 2b, and 6, and retain ¢, ¢,,¢, and c,. This source of divergence is screened 
when the polynomial is replaced by the differential functions*, which only shorten 
the length of the expression but do not assist computation. 

It may be said that precisely the same criticisms will apply to the omission of 
fifth and higher differential functions. This may readily be admitted at least by 
one who has no @ priori affection for these expansions in terms of functions derived 
from the normal surface. But there is a certain betterment for the following 
reasons : 

(i) We cannot possibly get decent fits to univariate frequency distributions 
without going as far as the fourth moment. Hence we are unlikely to get decent 
fits to frequency surfaces unless we go as far as the fourth momenta! coetticients. 

(ii) In all these cases theoretical relations hold between the higher momental 
coefficients and the momental coefficients selected for fitting. But the higher 


* There has been no attempt yet to tabulate the bivariate tetrachoric functions, yet without jtabu- 


lation they are of small service. But if it ever be attempted the influence of the 1/(1 - r) factor will 
soon make itself felt. 
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momental coefficients have larger and larger probable errors. Hence the further 
| up we can push the series for selected moments the more likely it is that the 
| theoretical relations referred to will be satisfied closely enough for practical work. 
The real test is: Can we get anything like goodness of fit from an identity of 
15 momental constants? I am convinced that taking only 6 momental constants 
(i.e. neglecting a), b,, b., b;, d,, d., d,, d,, d;) leads to an impasse, for it does not 
allow for any deviation in kurtosis from normality. 

(4) Marginal Totals. I proceed now to discuss certain properties, starting 
with the marginal totals. I integrated (i) for values of y from —x% to + %, and 
so reached the curve of the «-totals. It was: 

1 2? 
N 


Zr. = @ St — a, 4b, (11) + 3(1— 99), 
V 2a, ‘ 


+= (a, + rd, + ¢; (1 — r*) + 8r (1 — 7°) e,) 

1 

+ ° (b, + 2rb, + 7*b, + 3 (1 — r*) (d, + rd.) + 6r? (1 — 7?) d;) 
1 


a - 
+ —() + 1, + 17°C; + 1° C,) 
oO; 


) 

» . 

) 

Substituting the values of the constants, 7 disappears, and this reduces to the 

| anticipated form : 
N ~a5 


i jate SP (2 ~m~iada” —1 _9\% 
tei Sieve. 1+ 8(6. 3) VB. — 4 (8: oa 


“yA 
+= ,(d, + rd, + 37°d, + rd, + r'd;) 
a7 


which may be written : 
1 x 


—e "S14 4B, ( 


go 
, V2 0, 


‘ x aie a x _ a 9\I 


a* 

oy 

The corresponding y-marginal curve is 
oD 


1 y? 
N “362 _Jmty «9 , eo 6 ee eee ) 
“Jame? et tve (4-32) + e@-9 (4-6 ¥ +8) 
vecitienad (xlvii). 
The factors LWVB,, 3; (Be — 3), 1/8), 3; (8.'— 3) are seen to be multiplied by 
the simple fourth and fifth order tetrachoric functions in # and y. 


Z.y 


{f we define the sth order tetrachoric function to be 


T(#) = +5 (- : = a | 





xnieaueouilnen (xlviii), 





1 Leal 
S!N ZI ¢ 
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where 


Pe(a)= for C= DE=D os  C=DO—DE=B) OW srs ote} 





it is easy to expand any expression in tetrachoric functions, if its momental co- | 
efficients be known. For these functions are semi-normal functions, ice. | 


+o 
[7 =e) 1 (w) eb de =0, 
if s be not equal to s’, and 


+o ° 
| T (@) et dx = = : 


But no satisfactory proof of the convergence of such expansions has yet been 
given. If F(«) be any function of «: 


+o a +a 

| F(a) (a) d® de = [ F(a) pia(w) de x 7 

—@ —@% us 
“Jase | (4-2 DE-D 4 G-DE-HE-He-# 
“ane aa MF 2, 21 i” 


- ete Sincebiea (li). 


Hence for successive s’s we have 





+a bi = $2? _ : Ss - - @ 
| “ F@n@e*d vr pai = = 
+@ 
v2) T.(a ew = . 
[oo Perla el de= 7. a=, 
“ bg big detpe® 
es F (a) 7, (a) e** “>= Je Jaz t-)= ’ 
; 3 1 1 | 
™ é 4 — J l> 
nM P(a)r,(a)e** de Va4 Vag M ~ 34) = 755 om “PF 
1 1 , , € aie 1 B.-3 
h P(e) ro(w) d= 5 1 — OH + 3m) = 96 V2Qa ” 


oe ,_ 1 *B;-10V~, 
: ~ (pte! — 10 py’ +15 y,’) = Bill a 
V720 Om He ” wm) V720 Vr 
"7 2 1 1 
F xv (x bx da = ~ — nes 
eo V5040 V2e | 
1 1 
= —_ —_— (8, — 158, + 30) 
V5040 Van 
Bee 1 
V5040 V 2a 


+a bad 
| F (a) 14 (a) e" da = 


(pe — 15g + 45 pes’ — 1545’) 


7 ee) | (ii), 


and so on. 
* Bs’ =u’ = s/o, here and is not the more usual Bo = ts! by! = Ms Mg/o1°. 
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We have accordingly the following result, when we take z,=F'(a#), any 
function of 2: 


Ze = 
No, nt ay VB. vis 3) 75 + os, (Bi 10 VB,) t. 


te (8,—15 — 15 (8B, — 3)) 7, + ete.......... (lili). 


Tn this case «/o, will be the argument of the tetrachoric functions, and we 
must take 


' = BR 
"(F)=75 (- 15) me. 


With my definition of the tetrachoric function, which introduces the factor Fa 
s! 


and so differs from continental usage, all these functions, 7, excluded, are of 
essentially the same order*. We cannot therefore assert convergence in these 
functions in themselves. Next consider the numerical terms, i.e. s/V's}; we have 
for 4, 5, 6, 7, etc. the values: 

‘816,497, 456,435, 223,607, 098,601, ete. 

The second of these terms is 55°/. of the first, and the fourth of them more 
than 12°/,. There are few cases in which we can afford to neglect 12°/,; none in 
which we can neglect 55°/,! In fact the sum of the numerical coefficients following 
the 7, term amounts to 100°/, of that of the coefficient of that term. On the basis 
of the numerical factors only we cannot assert that it is justifiable to retain 7, and 
neglect 7;, T., T7, etc. 

Thus our belief i in the convergency of this tetrachoric expansion must depend 
upon terms like 


VB, (82-3), (8; —-10VB,), (8,—15—15(8,—3)), ete. 
forming a rapidly converging series. 

Now these constants are all linear measures of the deviation of the function 
from a normal curve of errors. Now it will be found that in a very large number 
of practical cases these normal curve excesses, so far from converging, tend to 
diverge. I will illustrate this by a simple case. The curve 

2\9 ~3°162,278" 
y=% (1+ 351,364 °) . 
oC 
falls quite within the bounds of ordinary everyday statistical experience. It gives 
B,="4, B,=36, 8B, =7:08350 and B, = 29°20. 
Hence the normal curve excess factors are respectively : 


632,456, 600,000, °758,942 and 5°200,000. 


* Compare the numerical values of these functions in Tables for Statisticians and Biometricians, 
Table XXIX. 
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These show obvious signs of divergence, which are emphasised if we go to the 
terms involving t, and 7,. The source of this is the simple fact, that if the low ’s 
diverge at all from the normal, it will be found that the higher f’s diverge in a 
far higher proportion. Experience shows no convergency in the normal curve 
excess factors. 

If we multiply the above values by the corresponding s/¥s! factors, we obtain 
the series : 

7, + 5163987, + ‘273,8617, + °169,7057, + °512,7257, + ete. 

Here it is clear that the contribution of +, may be as important as that of 7,, 
and that it is quite impossible to retain the 7, term and neglect the 7;, tT, and 7; 
terms. There is no convergency about these tetrachoric expansions. It is therefore 
unlikely that our surface, which is really an expansion in bivariate tetrachorics, is 
going to give highly satisfactory results, notwithstanding its fifteen momental 
identities. 

It is needful for the application of tetrachoric expansions that the higher 
normal excess factors should be shown to grow less and less. This in practical 
experience they certainly do not. How then has arisen the idea that such ex- 
pansions are reasonable? I think it has arisen from forming hypergeometrical 
series for urn-problems (as for Bayes’ Theorem in the manner of Laplace), and 
then approximating to the terms of such series by Stirling’s Theorem, with such 
limiting hypotheses as that the sample was small compared to the size of the 
population in the urn, etc., etc. Good illustrations may be found in the manner 
in which Edgeworth deduces his skew curve of frequency*, which amounts to 

Zy = T+ UT, 
and a somewhat similar proof is given by Bowleyt. In both cases, it is assumed that 
there are n elemental groups, or what I should term “ contributory cause-groups,” 
and these are supposed uncorrelated. Subject to the relation that u,/o? is finite 
for all values of p in the contributory cause-groups, 
1 2* F 


<a IR RB -§ 
ood ot (1+ VO vob 4% : 


ie 2-9? 1207, + ete.) 
x V27r0, Vn 24n y 


where 8, and B, are the mean §-constants for the contributory cause-groups. It 


is then argued that the coefficient of 7; is of the order 


as compared with the 
Vn 
coefficient of t,, and may therefore be neglected. 

Now in the first place the assumption that y,/o” less its normal value tends 
to a small finite value is not legitimate. For fairly frequent values of 8, and 8,, 
i.e. 0°5 and 4, we have 8;= 120, and 8, = 878, when for normal values they would be 
zero and 105. In other words if low f’s diverge somewhat from the normal, high 
values diverge with great rapidity, far more rapidly in fact than terms in 1/v n, 

* « The Law of Error.” Cambridge Philosophical Transactions, Vol xx. pp. 36—65, 1904. 

+ Elements of Statistics, p. 291, 1920. 
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unless n is very large, descend. But the main assumption that frequency distri- 
butions are in general due to an indefinitely large number of small contributory 
cause-groups is not only unproven but can be demonstrated to be incorrect. 
Professors Edgeworth and Bowley make the factors multiplying 7, and 7; to 
contain the mean values of y;/o* and y,/o4—3 for the contributory groups. But 
why should the contributory groups have their skewness all in the same sense ? 
Why should all the contributory causes be mesokurtic or leptokurtic? If they 
are not the ratio of the mean of values of y;/o* to that of values of w,/o*— 3 may 
take any value whatever, and there is no proof at all that: 


1 
Vn 


(“:) is indefinitely larger than : (4s), 
oe n\o 
where the rules stand for mean value. This argument becomes still more cogent 
when we bear in mind that, except by way of hypothesis, we have no real evidence 
of this indefinitely large multiplicity of contributory cause-groups. It was called 
in question by Galton many years ago*. And we can put to a definite proof its 
legitimacy. The ratio of the coefficient of 7; to 7,, according to Edgeworth and 
Bowley, is: 

/5 Q < [Fr ‘ 

+ = “ie but = = actually a 

1 i 

by our analysis above. Hence, if n be indefinitely large, 8,— 3 must be indefinitely 
small as compared with /8,; but this is a result contrary to all experience. 6,—3 
may have any ratio as compared to /B,, and if we had on the basis of observed 
data to make any statement, all we could hazard would be that these quantities are 
very much of the same order. Thus either the hypothesis of an indefinitely large 
number of contributory cause-groups is false, or else the mean £, for those groups 
is indefinitely small, ie. JB, is negligible for those groups as compared to B,—3. 
In either case the hypothesis adopted by Edgeworth and Bowley is contradicted. 
The tetrachoric 7; term is not negligible as compared with 7,. 


It is, of course, equally legitimate to argue that the 7, and higher terms cannot 
be neglected. We can only take comfort by hoping that a curve which has the 
first four moments identical with those of the observed data will be to some extent 
better than one which has only three. 


It may, of course, be retorted that the skew curves introduced by the present 
writer only introduce four moment coefficients, and therefore there is no @ priort 
reason for supposing them to be in a position to give better results than the curves 
based on tetrachoric expansions. The answer is fairly straightforward. While the 
tetrachoric expansions do not converge in terms of excess or defect from the normal 
distribution, the skew curves in question do give convergence in terms of these 
excesses or defects. 


* ‘Statistics by Intercomparison with Remarks on the Law of Frequency of Error.” Phil. Mag. 
Vol. xix. pp. 39, 40, 1875. 
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We start from the differential equation referred to its mean, namely : 


az 
= 
o 





beh 4h ot 
oC o~ 


Here b, is of the order unity, a and b, of the order V8, = y,/o%, and b, of the 
order 8, and 8,—3. Let us write VB,=8,, and 8,’ =p;/o°, 8,=./o%, and B, as 
usual p,/o*, Then if we proceed to find our series of b’s up to 6; in the usual 
manner*, we obtain 


b. 288.8, + 4B8,?B,' + 28,8; — 188; — 6B; By? — 128," 





it iosoie — 1268, — 1008.' + 2108.2 — 548, — 848,68,” + 608,78,” —4 4908,8;") 
+ 1418,'B,'B, — 188," — 728," + 2978,'B,' + 1928, 


Here the denominator approaches 2 x 144 as the curve approaches the normal, 
i.e. is finite and large. ‘The numerator can be rearranged as: 


2 {26 (8, -3)8,'+ (8,—3) 8, - 68; (8-10) 


+ 2(B,—3) 8 — 38/83 — 68,4. 
Here all the terms in the second line are of the cubic order; in the first line, 


the first two are obviously of the square order. We shall now show that the third 
term is also of the square order. Consider the binomial (p + q)", we have: 


2=npg=o, w,=npq(p—q) and w,=npq (1 + 2pq (5n — 6)). 
g/ = MP4 (p-q) B= _ npg (p- OU + ay - 2) 


Hence: 


(npqy (npq) 
. ; a. 3 12 
Accordingly : BR?” ape +10—- =. 
or, when n is made indefinitely large : 
By /B:'= 10. 


But, if n be made indefinitely large, and neither p nor q be very small, the binomial 
passes over into the normal curve. Accordingly the ratio 8,'/8,’ approaches 10 
under the condition of approach to the normal curve, and 8,’(8;'/8,’— 10) is there- 
fore of the second order of small quantities. This convergency of the terms+ of 
the series },, b,, b., b;, etc. explains why the types of curves obtained from the 
above differential equation give far more satisfactory results than tetrachoric 
expansions such as those proposed by Edgeworth, Charlier and others, which are 
actually divergent. 

* «©On the General Theory of Skew Correlation and Non-Linear Regression,” p. 5. Drapers’ 
Company Research Memoirs, 1905, Cambridge University Press. 

+ The statement of the convergency of this series is not novel, it was made in 1905. I have had 
many years in my possession the curves resulting from retaining higher denominator powers than the 


second, but except for a few extreme cases the changes are too slight to be of any profit in dealing with 
actual frequency distributions. 
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(5) Form of the Regression Curves. Clearly if jj, be the mean of the x-array 
of y’s: 
NeVe = de | zydy. 
-2@ 
The integration was performed as before by writing y — r 72g =Y, and inte- 
a} 


grating with regard to Y. The integration was somewhat laborious, but after 
substitution of the constants and making certain transformations, I found: 


jew, 8a 7 NB) (—1) +1 (ga — ri (F, ~ 8 =) 


=r—+ > - ...(lv). 
= b x ‘ a a « : 
1+ 4B (55-35) +e —3)(“- 65, +3) 








Assuming the distribution differs little from the normal the denominator under 
the long rule may be put unity, and the regression curve will run: 
ry) r = 2 3 
mat th da— 7 VB) ~ 1) + (qu — Be) (=- 3 ©). lv) bis. 
If qs, — 782= 0 we have parabolic regression, but as a rule there is no more 
reason for neglecting p, than p,* ; indeed experience shows that a cubic is generally 
more the type of curve needful to describe regression than a parabola. If the surface 
be not close to the normal we cannot neglect the p, and p, terms in the denomi- 
nator, and the regression curve will accordingly then be a quintic. There is not much 
difficulty, however, in computing 7, for a given «; for multiplying the numerator 


1 a? 
‘ 1 ie 
and denominator by ~—-e, 20, , we can read the result: 
V20 
Fey © UGa— PNB) e+ VEGn— MBIT ayy 





oy o; Ti+ V3VBi 144+ V $4 (Bo — 3) Ts 
The coefficients of the tetrachoric functions will be provided as numerics by 
the data and the values of 7,, 7;, 7, and 7, can be taken at once from the tables. 
It is accordingly not hard to test the regression curve. It should be noted that 
when «= 0, 
Yn = do, (qu — 7 V—)/( + $(B.—3)), 
or the regression curve does not pass through the mean (i.e. the origin y = 0). 

- Exception may be taken to (lvi) on precisely the same grounds as to the 
Equations (xlv) and (xlvi), namely, that the stopping at 7, and 7, is not based on 
any principle of convergence. 

Equation (lv) provides the customary conditions for linearity of regression, i.e. 
qu = 7 VB, Qu = 1Bs- 
There will be corresponding values for the regression of # on y, if that also is 
linear, i.e. 
Qeza=TVBy, ds=TBe- 


* See Equation (xlix) for value of p,_;. 
Biometrika xvi 
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(6) On the Scedastic Curve for the Arrays. We need to find the integral 
+a 
| 2(y — 9x) dy = 220 y,.° 


+2 2 
={ e(y-r22- (j.-17 Ze) dy 
1 1 


+a — +o a 
-/ 2(¥?—Y,) ay=| 2¥°dY — 2,Y,2. 


+o 
The | zY*dY was obtained as before by substituting Y + r a for y in z and 
-@ 1 
integrating out. 


After substituting the constant values, I found 


x{a-) [1448 (S-32)+s(8-3) (2-6 5 +8)| 


i = ir (qa =F. VB;) ie (he = 7Gu)} 


/ 
/ 


ion (= 7 1) (r (Ya 7 rBz) =" b {qz0 so oat (B - 1) 


Hence dividing by z, from (xlvi) : 
TF ime 
| z¥?dY =o3(1-7°) 
$5] -« 
x“ a \ 
3G, \r (ar =f VB) — (Gi2—1Ga)} + (= _ 1) (r (Gan —rB,) — 4 (zz -l-r (B. -1 )}) 


a; 


2 xv a 2 . 
1+ 48, (=-3") +4 (@-3)("-6 243) 
1 1 





To obtain oy? we must subtract from this result (Ge— r “: n), which is given 
by (liv). 

Thus we have finally : 
Cy." = a? a oo r?) 


F 1 thd 
= {r (qu - 7? VB) — (G2—-1Ga)} +( : _— 1) (7 (da —1rB»)—$ {qa—1—17(B.— 1)}) 


a; 
week, 
— /# x ax xe 
1 + t VB, a= +? ay (Be — 3) (F.- 67,43) 
1 1 


(= - 1) 4 (qu —7 VB) + & (dn — PBs) ce i ) 


a7 


ae ; 7 a ao 
1+4% VB, (—-8 ) +34 (8.-3) (—-8= +8) 


az 
a; 





_ a 





This gives a curve of the ninth order for the variance and one of the tenth 
order for the standard deviation of the arrays. The computation is not so serious 


. ie Gy ° : 
as might at first appear, as 7,—r — « will really have been computed in deter- 
oO} 
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1 2? 


is a , — 1 “ter ; 
mining the regression line, and if multiplied by ——* 201" we can again express 
27 Oo; 
our results in tetrachoric functions, which are tabled. Thus 


Oy =o7 {1 -r 


_ v2 {r (qu — —T NB) - = fe =F -1'9n)} T2+ V3 (r (dau — a Bs) — 4 + {qen = poy —1*(B,—1)}) Ts 

+ V3 VB, 14+ Voi (B:—3) 75 
|S Vi Ga r8) 1 
b. 1+V3 V Bit, + V3; (Be— 38) 75 J oe eee eee ee ee eee eee eee eee eee eee ee | 


If the regression be linear, ga = 7 V8;, ds = 7Be, and: 

lo (/ RR’ /2 = = 

eyes {1 — 924 Bs 1 — 7 VB,) rT. + £ V3 {gm —1— 7? (B,—1)} 7 "| 
™m+ v3 VB, T+ if, (8. — 3) T; 





(Iviii). 





or if the deviation from normality be not too great: 


ays og 1-18 + = (WB/— 0 VB) + J (S.-1) {te-1-7(8,-1)} 
wnsincied (Ix), 


which is parabolic scedasticity for the variance, elliptic or hyperbolic for the 
standard deviation, according as gq. is in defect or excess of its normal value. 


(7) Special Cases of some Importance. Let us assume the regression to be 
linear; we will afterwards suppose the surface symmetrical as well. The latter 
case is that of a generalised whist surface. 

If the regression be linear : 

qu=Pr VB, Qz=TVBy, du= Ber, Gg MESH oosesccscews (Ixi). 

Accordingly : 

B,= 4; (82-3), B’=sh(82—-3), Qa= hr (B.—3 9), Qis = br (By — 3), 

Qoe = zy (Gn — 1 — 2r*)...... (Ixii). 

We now turn to Equations (x) to (xv). We find: 








For Linear Regression. For Linear Regression and Symmetry. 
a,= ai Pp {Vv B, (3r? —1)—2 VB’ rh, m= satrap VB 
a= sq aE OP-)-2V8r}, | a-s ane VE 
ao" sa 1. sp VBI 20° — VB, (81° — 1), ans eee 
= - ap VE C+ 4e), + ~on 3 +r) 
“= 9 a ee eS a= — 9 - at +r 
C= ao {/B, 2r* — VB,’ (3r?— 1)}, a=6 te a a Be , 











For Linear Regression. | For Linear Regression and Symmetry. 
mem 5 a E rays {(e + €’) (1 — 47°) + 2€(1 + 27°)}, | b= — AC = fe({1- 47°) + €(1 4 2r°)}, 
b,=- 4 . 2) \(1 — 47") e— 7? (14+ 27") eo +(1 +577) E}, Bb, =- 4 3 2 ((1 — 57° — 2r*) € + (1 + 5’) G}, 
= 9G J wp (SM (CHe) + WAHL, 2b. a5 aa {—Bre4r (249) Oy, 

b,=— 4 : ry {—7?(1+27*)e +(1—40*)e’+(14+5r") fh, | b,=— 4(1 J "y {(1 — 5r? — 2) € + (1 + 5r*) I, 
t= 94; eer {(1 — 4r*) e— 31% e' + Gr? ff, d= 54 az "yi ((1 — 4? — 3r*) € + 6r® f}, 

d, = — 6 - 26 {—3r¢—r? (2 +7") e+ 3r (147°) EI, d,=— 6 ~ {—73(5 +7") e+ 3r (1 +7") ff, 
d,= 13 az a {— 9? (1 + 2r*) (e +e’) + (1 + 407 + r*) SI, c= naz i {—2r?(1 +27r*)e+(1+4 477+ 2°) Sh, 
d,=-— 6d er {—71(2+9r*) e—3ré + 3r (14 7") &, d,=— 6(1 ay {—-r(5+r)e+ 3r(14+r*) EI, 
d;= 24 ( ") {— 3rte + (1 — 47°) & + 67? J, d,= 24 ( = ai {(1 — 40" — 3r4) € + Gr? F}, 

pani (Ixv). eee | 
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We next turn to Equations (xxxv) to (xlili). We deduce, writing B,—3 =e, 
Bi -3=¢,qu—1- 2a: 


It will be clear that in the first case all we need are mean, standard deviation, 
and the two §’s for both marginal totals, besides 7 and g.. For the second we 
need only the constants for one marginal total, and r and qa. 

I propose to illustrate now these results on one or two examples in order to test 
the value of such a 15-constant frequency surface. 

(8) Illustration I. Symmetrical Surface with Linear Regression. As an 
illustration, I take the correlation surface for the numbers of the same suit in 
two hands at whist*. Here we have: 

8, = By = 036,2667, 8,=8.'=2°893,2914, r=—-4, of=07=1°863,9706, 

B=Y=325, o,=0,=1'365,2731, VB, =v’ B/='1904,3818, 
€ = qu —1—2r?= — 044,0423, « =e’= —:106,7086. 

There is little doubt that gq» could be ascertained from an analysis of the 
double hypergeometrical; but to save wasting time over lengthy algebra, I com- 
puted qu. from the Table for theoretical whist frequencies on p. 186 of Biometrika, 
Vol. xvi. It may be of interest to put on record the value of g.. as deduced from 
qu for any arbitrary origin, when the surface has linear regression and is sym- 
metrical. We have 


9 


7 “L x a oe 
Gz = Qe — 4r VB, - 2 (LH Br) ceeteete estes (Ixvii). 
l 


2 4 
1 oO; 


* See Biometrika, Vol. xvt. p. 176. 
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To obtain this, 1t was needful to show that 
da = qe = 7 VB, + (1+ 2r) =a 


a 3G, 


8 
§] 


sbareabih wiebuswied (Ixviii). 


3 


In both cases we supposed the arbitrary origin to be at a symmetrical point, i.e. 
€=¥Y. In the actual computations this was taken at 3, 3, or % and y ="25. 


The total frequency being 25,000, we have the following numerical value for 
the equation to the 15-constant surface : 


36 a3 ae Qxy m2 :) 
2 = 2264109 ¢ 1°\%" Bais % |964, 2106 —-071 4143 (= + +¥) 


O, Gy 
+ °040,3305 (= +-675,5851 “9 , 


oy G\0_2 G2// 


a3 2 2 - 
+°026,7804 (© — = ¥ — ~. me 


of, d;70, G0; Go? 


- (005,652 25 007,6716 —Y =v -+-014,0870 * 5+ 007, 6716 


+ -005,6525 2 (Unix). 
o.'/ 

We now proceeded to find the contour lines of this surface. 

The accompanying table exhibits: (A.) the frequency of appearance of certain 
numbers of cards of the same suit in two players’ hands calculated from the double 
hypergeometrical series of theory, (8.O.) the ordinates of the 15-constant surface, and 
(S.V.) the corresponding volumes of that surface deduced by the formulae provided 
in the Appendix to this paper. It will be seen that the ordinates agree much 
better with the results of the series (A.) than the volumes. Nor is this to be wondered 
at, for the moments were naturally calculated from discrete quantities (i.e. actual 
numbers of cards) and not corrected for grouping. We now proceeded to plot the sec- 
tions of the 15-constant surface (S.O.) precisely as a year ago we plotted the contour 
lines of the double hypergeometrical series and the two surfaces to the same data* 
The results were astonishingly smooth and closely resembled the contours for (A.). 
In fact the two families of curves were so close that they could not be represented 
as in the previous cases as superposed. Such goodness of fit had not been 
anticipated. 


The whist contours with cancrine forms differ widely from the ellipses of the 
normal surface. It was therefure most encouraging, considering the non-normal 
character of the whist contour, to see how nearly the 15-constant surface bent its 
contour system to shapes distinctly non-normal. It was also satisfactory to learn 
that the non-converging double-tetrachoric expansion could so satisfactorily fit data 
provided we equated at least fifteen-moment constants. Of course the negative 
frequencies show themselves and these would be more conspicuous had we carried 


* Biometrika, Vol. xvi. pp. 178—181., 
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The Fifteen Constant Bivariate Frequency Surface 


TABLE I. 


Comparison of Ordinates and Volumes of 15-Constant Surface with Frequencies of Double 
Hypergeometrical Series for Whist Returns. 


Number of Cards in First Player's Hand of Suit X. 
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0 1 2 8 | 4 5 6 7 8 9 10 11 12 18 | 
omer Wena Yee Wiebe ~ nas Sa re = —| 
(a 0 5 24 60 | 88 79 44 16 3 0 ote te te 
048.0. 28 4°41| 25:00! 66°53 94°38 | 81°85+| 47°32 |17-25+) 3°55-| -38] -02| -00| -00| -00 
\s. V. 65 | 679! 32-41] 79-22 | 108-00 | 91°55 | 51°92 |1893 | 409} -51| -03| -00| -00| -00| 
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our computation of ordinates somewhat further, especially to negative card fre- 
quencies. 


But this first illustration is undoubtedly valuable as showing that at any rate 
in frequency surfaces with a single axis of symmetry the 15-constant surface will 
provide quite a reasonable graduation. 


I next proceeded to consider the marginal totals. Now it must be remembered 
that the moments and product moments of the surface were obtained by treating 
the frequencies as discrete quantities (number of cards of given suit in hand), and 
accordingly the frequency surface is what has been termed a “spurious surface ” 
and will really represent by its ordinates rather than by its volumes the cell 
contents (see p. 285). 


The curve of marginal totals is 


N a ee 
te= 5 {hn * VBits+ Fagg (Be — 8) 75 + ‘ 


Here: 
= = 73245419, V8,="1904,3818, (@,—3)=—-106,7086, 
1 
or, Z_ = 18311°355 (7, + °1554,9212 7, — 0487,05597;) ......eeeeeeeee (Ixx), 


from which Column C of Table IT has been calculated. 


It will be seen that the sum of the ordinates of the surface in each array is very 
close to the corresponding marginal ordinate. The fit as measured by P is very 
good. At the same time it must be remembered that we are not dealing with 
material showing random deviations due to sampling, but with a mathematical 
function. The P’s as found for the marginal totals in the previous investigation of 
this surface* are: 

P = 000, putting 8, = 0, and giving the true value of £,, 
P =°350, putting @, = the true value, but neglecting A,. 


There is thus very considerable improvement obtained by using the tetrachoric 
expansion and both f’s correct. 


If we use a Type I Pearson-Curve which corresponds to the true values of the 
§’s we find for its equation : 


7 4, . «“ 21+13554 l x“ 11-44407 1 % 
eee (1 7 1100731) ( - 596008) ai ia 
the origin being at the mode 3°10404 cards. 


* T have recomputed the P’s, using the ordinates and a total of 25,000 deals not the 1000 of the 
earlier investigation, Biometrika, Vol. xv1. p. 185. 
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The Fifteen Constant Bivariate Frequency Surface 



































Carps OF A GIVEN Suit IN 17 HAND 


Diagram I. 


We have the following results : 


TABLE IT. 








A 











Number of | | a B C ’ a es 
Cards of Soot Sum ” Santen | Theoretical | 4=C-A “~—s 
given suit | (i.e. terms of of Surface in Curve x 

| Hypergeometrical) | Array | | | 

0 319°8 340°97 341°0 +21°2 1°405 

1 | 2001 °6 | 1957 °32 1956°4 —45°2 1-021 | 
2 | 5146°8 | 5133°50 5134°1 -12°7 031 

3 7158°2 7216°45 7218°1 +59°9 501 

4 5965 °2 | 5954-41 5952°7 —12°5 | 026 

5 3117°3 3095°18 3095-4 —gi-o | 154 

} 1039°1 } 1045°88 1046°6 + 7°5 O54 

7 220°4 222°97 223°7 | + 3°3 | “049 

8 29°2 28°22 236 | —Oo6 | O12 

9 2°3 2°02 2°1 - 0o2 | O17 
10 O°] ‘08 0-1 00 | “000 
11 0002 00 ‘00 | 00 | 
12 “0000 “00 00 | 00 | 
13 “0000 00 “00 “O00 | =: 











x? (for n’=11)=3°270, P=-972. 
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\ ————— Contour Lines oF 15 CONSTANT SURFACE 
FOR WHiST. 







CARDS OF THE SAME SuIT IN 2"? HAND 
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Carps OF A GIVEN Suit IN 157 HAND 
Diagram II. 

The ordinates for this curve are given in Table ITI, p. 290, and compared with 
those of other solutions (Biometrika, Vol. XVI. pp. 180-—1*). 

The superiority of the Type I curve over other representations is manifest. 
This manifestation in my experience always appears when an adequate number of 
vategories are dealt with and a proper test of goodness of fit applied. It leads one 
to believe that if the corresponding frequency surface could be determined it would 
undoubtedly in like manner show its superiority over our present 15-constant 
surface. 

It is further obvious that as far as the marginal totals are concerned the 

* The frequencies are here calculated from the ordinates, as I have since realised that having treated 
the moments as discrete—i.e. without corrections—this was the more proper course to pursue. On 
p. 185 of the above memoir they are dealt with as areas and for totals of 1000, not 25,000. In 


Equation (xxiv), p. 182, for z, the value 29150°101 is incorrectly given. This is the modal value of the 
ordinate, but must be replaced by 10 x *824,7586 in the reduced form of the curve. The curve was 
worked from its logarithmic form and this error did not affect the computations. It was introduced 
when the non-logarithmic form of the curve was put down for publication, and the modal value of oo 
inserted by analogy—of course a false one—with Equation (xxix), p. 183. 
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TABLE III. 

















Number of : Te. on r Marginal Totals Marginal Totals 
Cards of + naar ng —" Filon-Isserlis Tea tinw 
given suit Surface Surface 
me -2 40 
0 319°8 311-4 3410 || ~haanat 3449 (7 a65l 476-8 
ae 0 426°3 
1 2001 °6 1984-2 1956°4 1916°1 1931°7 
2 5146°8 5192-2 5134-1 5140-4 4833°0 
3 7158°2 7161°6 721871 7269°6 7089°8 
4 5965-2 5928-4 5952°7 5979°6 6243°6 
5 3117°3 3117°2 3095°4 3060°0 3275°4 
6 1039°1 1052-4 1046°6 1020°7 982°8 
7 220°4 221°9 223°7 229°1 155°3 
8 29°2 27°3 28°6 35°3 111 
9 2°3 17 2°1 3°3 03 
10 Orl 043 O'l 0°3 0-0 
11 0002 0003 00 00 00 
12 0000 0000 00 00 “00 
3 0000 0000 00 00 00 
x’, 0 to 10 —_ 1-262 3°270 12°11 155°65 
P sai ‘999 ‘972 “350 “000,000 


























15-constant surface gives far better results than either the Filon-Isserlis or the 
symmetrical surface. The conclusions to be drawn are essentially these : 





(i) No curve of frequency is adequate which does not give 8, and 8, the same 
value for curve and data. 


(ii) Of two frequency curves which both have the same f, and £, as the data, 
that which approximates to a convergent series will be better than that which 
approximates to a divergent series. 


(9) Illustration II. I take as a second illustration the correlation table of the 
contemporaneous barometric heights at Southampton and Laudale. The data are 
given in Table IV (p. 291). The constants are as follows : 


Southampton (:) Laudale (y) 
Mean # = 29”°9839 Mean 7 = 298488. 
The remaining constants are in tenths of inches as units: 
o, = 3°250,067,  o, = 3°932,290, 
Pu =9'971,435, pa = 11°919,404, ppp = 15°598,613, 
pn = 361-329,845, po = 399'245,042, _p,, = 500°535,855 ; 


II 


whence : r =°780,819, Gan = ‘286,962, die = 310,386, 


2444352, guy = 2°532,831. 


qa = 2676586, qu 























TABLE IV. 


Contemporaneous Barometric Heights at Southampton and Laudale. 














Height in Inches (Central Values’. 


Southampton. 
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292 The Fifteen Constant Bivariate Frequency Surface 
For the marginal totals we have: 


2A, = 171,140, yB, = °224,535+, 
2B: = 3°612,028, y Bo = 3'194,947, 
V,B, =°413,691, VB, = 473,852. 
The above lead to: 
B, = (025,5012, By = 008,1228, 
Qn = 055,6882, Qu =018,7496, — Q, =-031,7290. 
The total frequency NV = 2922. 
Hence by some further laborious arithmetic the fifteen constants of Equa- 


tions (x)—(xv) and (xxxv)—(xliii) were determined and their values being sub- 
stituted in Equation (i) there resulted the following surface : 


‘ -9809.9466 (= 561.638 Y- y? 
z = 58243,994¢— 12809,9406 (= 1:561,638 ~~ + 


3 
,0y Cy? 


x 11:137,4261 —-198,0935 ” —-199,0118 4 
( Ox Gy 
_ © j ~~ «ey es 
—-502,9167 ~, +-531,4255 “Y —-186,8886 
x yoy oy 
+-234,3790 “. —-249,8810 “4 —-114,8185 “+ -209,3993 % 
ox" ? oy 


x a Ty Txoy y 


wy we e292 pap 
+°182,1262 ~ —-393,6955 4 +-332,3992 77 —-199,4956 
Or oO; Oy Oro, Oxy 
OYE ® hd y' | rwss 
+ 026,3067 =) ...... (Ixxil). 
oy’) 


Some remarks may be made on this equation before we proceed to the com- 
puting of the ordinates. We shall only calculate these to two decimal places; the 
numerical coefficients given are not correct to their last figure, we should have had 
to compute the original constants to more than six figures to obtain such accuracy. 
But they are sufficiently accurate to give us correct values of the ordinates to two 
decimal places, when we multiply up by large values of the higher powers. We 
have taken our axes of « and y in the usual directions, i.e. from left to right and 
downwards of the correlation table. Thus the values of # and y if positive denote 
the tenths of inches to be subtracted from the means of « and y to obtain the true 
barometric heights. The reader who desires « and y to give additive corrections to 
the means has only to change the sign of both # and y in the above equation for z. 
We may remark that there is no sign of convergency in the terms of the poly- 
nomial of successive orders. Such goodness of fit as the surface may present must 
depend, not on its being a convergent expansion, but on the conception that a 
surface of equal volume, with the same centroid and the fifteen momental 
coefficients the same as the data, cannot differ very widely from the form of the 
true surface. 
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It will be noticed that the mode of this surface as well as that of the previous 
illustration depends upon solving two simultaneous equations of the fifth order. 
The only way, as far as I can see, to obtain an accurate value of the position of the 
modal ordinate would be to solve these equations by successive approximation ; 
but a fairly reasonable value can be found from the ordinates, when these have 
been calculated. This we shall now proceed to do, 


Table V (p. 294) gives the central ordinates of the occupied cells of the original 
table. Although it is not correct to consider the ordinate of the cell as a measure 
of its frequency, it gives a rough appreciation and the comparison with Table IV 
compels us to adopt the view that the original data are very ragged, not impossibly 
heterogeneous*, and that it would be difficult for any continuous mathematical 

J 
surface to represent them; all it can do is to graduate them with more or less 
success f. 

In Table V are given the calculated ordinates of the 15-constant surface. But 
the labour of calculating 450 ordinates was very great, and the deduction of the 
corresponding volumes (Table VI, p. 295) added to the work. It was found to be too 
toilsome to determine the amount of frequency in the extreme “marches” of the 
15-constant surface. The total of the volumes showed that the uncomputed tails 
of the arrays must contain some .13 units in all (actually 13:03) of frequency. 
These 13 units were distributed in 93 cells of the “marches” in the manner indicated 
in brackets in Table VI. It will be seen that in no case was an additional unit 
placed in any cell, the average of these added cells being ‘14 of a unit. The 
caleulation of an additional 100 ordinates was thus avoided. The distribution was 
of course a rough appreciation, and was done before the marginal totals had been 
calculated by the tetrachoric expansions. The amounts added to the arrays are 
in each case in the decimal places, and until tables of the bivariate tetrachoric 
functions are available some such appreciative process of distributing the frequency 
in the marches will have to be adopted. No statistician can be called upon to 
calculate 500 or 600 ordinates and the corresponding volumes. 

From the ordinates of the 15-constant surface (in Table V) the contour lines 
were found by a system of sections and the work was, if laborious, relatively easy ; 
the system of curves in Diagram III was obtained. We next proceeded to deal with 
the data (Table IV) themselves. The volumes were replaced by the ordinates 

* We ought probably, in the case of two barometric stations, to deal separately with the correlations 
for the winter and summer months, possibly we ought to use only three month periods. There are 
signs of at least two modes in the present correlation table. 

+ Table VII, p. 298, shows the ordinates as found from the observed frequencies (volumes) by aid of 
Formula (7) of the Appendix. Although this treatment of the frequencies has itself a slight smoothing 
effect the appearance of negative ordinates creates some trouble when the series of sections are plotted. 
It is from this table that ultimately the contours of the observation-surface were determined. The reader 
will appreciate not only the difficulty of proceeding from such a series of irregular ordinates to the 
contours, but how much room is left for arbitrary steps in the smoothing. Too much stress cannot there- 
fore be laid on the deviation between the observational and the 15-constant surface contours. It would 
have been by no means difficult to admit a bias to a greater agreement to play its part, but this was care- 


fully guarded against. The reader may rest assured that if the contours are an individual solution of the 
problem, they are a solution which is absolutely independent of the 15-constant surface contours. 
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296 The Fifteen Constant Bivariate Frequency Surface 


calculated from the formulae in the Appendix. These are given in Table VII, 

p. 298. Then these ordinates were plotted as sections, and the real trouble began *. 

Although the computation of the ordinates from nine adjacent volumes in itself 

exerted a smoothing effect, yet the section-curves remained erratic, being often 

bimodal. Whether this was to be attributed to random sampling or to heterogeneity 

in the material, it is not possible at present to say. When attempting many years 
CONTOURS OF 15 GONSTANT SURFACE 


FOR BAROMETRIC HEIGHTS AT SOUTHAMPTON & LAUDALE 
28-5 





29-0 F 


29°5 


30:0 F 


30°5 H 


HEIGHT OF BAROMETER AT LAUDALE IN INCHES 











310 I I I I 
31:0 30°5 30-0 29°5 29-0 
HEIGHT OF BAROMETER AT SOUTHAMPTON IN INCHES 
Diagram III. 











ago a general theory of skew correlation, I found the contour lines of a number 
of observational frequency surfaces, but found them by no means easy to reduce 
to a mathematically useful smoothness. It has always been a matter of admira- 
tion to me that Galton should have deduced the system of concentric, similarly 


* See footnote, p. 293. 
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- 


placed and similar ellipses of the normal surface from a table of some 350 pairs 
corresponding to stature in Father and Son. Here we have nearly 3000 pairs 

of variates, but fail to reach anything like so great regularity as he attained on 
| his far inferior numbers. In the case of the sections the curves were drawn by aid 
of the spline, without smoothing out double modes and other irregularities. The 
contours were plotted from these sections, and these contours were then smoothed, 





GONTOURS OF OBSERVATIONAL SURFACE 


FOR BAROMETRIG HEIGHTS AT SOUTHAMPTON & LAUDALE 
28-5 








29-0 


29-5 





30-04 


30/5 


HEIGHT OF BAROMETER AT LAUDALE IN INCHES 

















31-0 I I I I 
31-0 30°5 30-0 29°5 29-0 
HEIGHT OF BAROMETER AT SOUTHAMPTON IN INCHES 
Diagram IV. 








there being an individual drawing of each. They were occasionally asymmetrical 
lemniscatoid curves! The irregularities were gradually smoothed out, and the con- 
tours placed one on top of the other, being properly orientated, The contours were 
now once more adjusted to each other. All this was done without paying any 


attention to the 15-constant surface’s contours. The result is seen in Diagram IV. 
Biometrika xv 20 
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The observational surface is seen to be very precipitate as we pass from the cen- 
troid* in a direction S.E. by S., and north-west of the centroid is a table-land. Here 
possibly there ought to be a second low barometer mode, if there be really hetero- 
geneity in the material. It may be asked why, with this possibility before us, did 
we select barometric material? The answer is that—as first noticed by Quételet— 
the barometric frequency at a given station forms a practically important case of 
definite skew variation, that the marginal frequencies of a barometric table—or 
the frequencies of heights at single stations—do not exhibit definite traces of 
bimodalism ; and last but not least that this particular correlation table was one 
for which all the 15 constants had been determined for other purposes, and so the 
surface could be obtained without excessive labour. Owing to the “ precipice ” it 
was not possible to superpose Diagrams III and IV in one drawing. Tissues of 
III and IV are provided in the pocket at the end of this volume, and the reader 
will easily convince himself of the goodness of fit of the contour lines in the case 
of I and II and the badness of fit in the case of III and IV. But then he must 
remember that in the case of I and II, both are mathematical surfaces, while in 
the case of III and IV, III is a mathematical surface and IV is a smoothed series 
of observations, themselves subject to all the irregularities of random sampling. 
It does not necessarily follow that the 15-constant surface is a bad graduation of 
the data. A study of the data in the table suggests that no continuous mathe- 
matical surface is likely to give a first-class result. 

In view of the good success of the 15-constant surface in the case of whist, 
I do not feel wholly discouraged by the bad fit of the contours for the barometric 
data. We have not at present enough comparative material to be able to say what 
is really good or bad in fitting theoretical frequency contours to observational 
contours. 

Even before passing judgment in this case it is well to study other aspects of 
the fitted surface. I will deal first with the regression lines. 

If « be the Southampton barometric height, y that of Laudale, 7, the mean 
of the heights at Laudale for a given height « at Southampton, and %, the mean 
of the heights at Southampton for a given height y at Laudale, the origin of all 
quantities being the mean heights 29-9839 at Southampton and 298488 at 
Laudale, we have by Equation (lv) for the regression curves : 


n2 3 - 
| ‘018,0280 ( — 1) + 023,9590 (= “<a =a 
ay a,” omy 


Fx = 3°932,290 /-780,819 ~ — 











3 ° * "2 
\O1 1’ 1 1 
een (Ixxiii), 
3 3 
| 029,8035 ( v-1 ) — 006,3593 (44-3 ¥) 
%, = $250,067 | 780,819 2 — é pai) 
! \ o> y y y a 
( 2 14-078,9753 (4- 32 ) + -008,1298 (“- Fy 2) 
2 2 ° . 
onnsiiel (Ixxiv). 


* By an oversight the centroid has been omitted in Diagrams III and IV, but the reader will easily 
insert it from the values of its coordinates on p. 290, i.e. Southampton 29’°984, Laudale 29’849. 
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302 The Fifteen Constant Bivariate Frequency Surface 


The labour of computing #, and %, was considerable, but was accomplished for 
each 1,” of barometric height. The results are plotted in Diagrams V and VI, 
which show the observations, the regression curves of the 15-constant surface and 
the regression straight lines. However algebraically complicated the regression 
curves, and geometrically quaint their graphs, there is small doubt that they fit very 
well, and possibly better than the regression straight lines, the observation points. 

I showed in 1897* that Type III frequency curves fitted to barometric data 
gave a physical limit to high barometer at each station. This limit was 31511 at 
Laudale. The marginal total frequency curve at Laudale becomes negative at 
30972, This is shown by the numerator in the second part of the value of %, 
becoming zero. It follows that the regression quintic is asymptotic to the line 
y =30"972, itis shown turning up toward this line in Diagram VI. The asymptote 
naturally lies at « height outside experience, the maximum observed height at 
Laudale being 30796. No similar asymptote was found in the case of the 
Southampton marginal frequency +. 

The reader is asked to consider carefully the significance of the fact that these 
regression curves have not been fitted by determining their five constants so as to 
give the best fit of the curves to the array means. The array means are not 
directly used in their determination. ‘They are, so to speak, a bye-product of the 
15-constant surface and depend on the observational moments and_ product 
moments. It is this indirectness of fitting which makes their goodness of fit— 
within, of course, the limits of observation—really a strong point for the 15-constant 
surface. Outside the limits of observation they may in this case and probably in 
others act eccentrically. They cannot be said to have a strong theoretical founda- 
tion, but granted the equality of the 15 constants, then they are giving a quite 
reasonable result. Knowing how closely the regression was linear (the two 7’s and 
r being nearly equal) it was a surprise to find how well the quintics bent to their 
tasks ! 


I will now turn to the marginal totals. We have from Equations (xlvi) and (xlvii): 


x 
| 2,da = N {yt — 068,9485 »73' — 025,5012 eT}... 0.00. (Ixxv), 
| Zydy = N {yTo = 078,9753 y'Ts a ‘008, 1228 yz, } cece (Ixxvi), 
and the fundamental property | rede =— J Ts; (x). 
: V's 


Here a and y’ stand for w/o, and y/o,, 7) and 7) for the corresponding prob 
ability integrals, i.e. (1 +a,) and $(1 +4,) of the usual notation, while 
1 ‘ 1 


—ha'2 7 ro me 
Ts = é 2% (a’ ~~ 1), gt, = V2 


ee a 
—— e” 2" (a? — 3a 
V2r ‘ ) 


, 


y'T3 


ll 


So ale en 
 o—hy”? /,/2 ia ~ ON La.'s 2,,/' 
——@ 2° (y?— ] TT, =——e 4 (y*—3y). 
— -e  ee 
* Phil. Trans. Vol. 190, A, pp. 435, 446—8. 
+ The reader will notice that on the low barometer side the quintics are asymptotic to the respective 
regression straight lines. 
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Unfortunately the Tables for Statisticians and Biometricians, Part I, give the 
tetrachoric functions for argument }(1—a,) and not for 2, and so are not best 
suited to our present purpose*. We have been obliged therefore to use other 
tables. Perhaps the most convenient are those of Professor James W. Glover, 
entitled: Tables of Applied Mathematics in Finance, Insurance, Statistics, Aun 
Arbor, 1923. 


He tables the derivatives of $(t) = 55 —~e *” as g(t), g*(t), ¢'(t), etc. The 
21 


relation between the integral of 7, and Glover's ¢° is: 


; 1 
F 1 =) = (. 1 s—1 s—2 (¢ ’ 
Jra(e. (— 1" 7 (0) 
where t=a/o,. Accordingly we find for Southampton: 


t 
| z_da = 2922 {4 (1 + a) — 068,9485 $7 (t) + 025,5012¢*(t)}, 


and for Laudale, if t = y/o. : 
rt 
| z,dy = 2922 {4 (1 + a) —078,9753 ¢" (t) + 008,1228 4% (é)}, 


where we must remember that: 

(a) for t negative, e.g. barometer for Southampton above mean 29-9839, we 
must use (1 — a) for }(1 + a) and further : 

(b) for t negative $*(t) must have the opposite sign to its tabled value. 
¢*(t) for t negative retains the same sign as in Glover’s Tables. 

If these points be borne in mind it is straightforward, but laborious, to compute 
the probability integrals of the marginal total, and by subtraction of their values 
to obtain the cell frequencies. The following table gives the results with com- 
parison of the margins for Rhodes’ surface. 

Mr E.S. Pearson for Mr Rhodes’ groups finds the values P = *59 and *62 for 
the Southampton and Laudale frequencies fitted with Pearson skew curves. The 
results do not give one the confidence we might desire in the tetrachoric series for 
a skew univariate frequency. The tetrachoric gives a good result for Southampton, 
where the Rhodes’ surface marginal total is poor; it gives a poor result for Laudale, 
where the fit of the Rhodes’ surface margin is passable. One would have thought 
that the equalising of the four moment coefficients would have given a better 
result than the equalising of only three. It certainly does so, when Pearson skew 
curves are used. I have reworked independently Equation (Ixxvi) and recomputed 
the cell frequencies, but only to reach the same result. The chief region of error 
is from height 29’°9 to height 30:4, but I can see no reason for the fall here. 
We are bound I think to conclude that although it is better to equalise four than 
three moments, the form of the curve fitted to the frequency may count for a good 


* ‘I'he Second Part of the Tables for Statisticians will contain a full table for the tetrachoric functions 
to x as argument in our form. See also Dr Alice Lee’s Table in the present issue, pp. 351—4. 
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TABLE VIII. 





SouTHAMPTON 
Computed 
Tetrachoric Rhodes 
3°1 \ Ol \ 
3°6 | 15°5 1:2}69 
8:8 | 5°6) 
21°2 19°3 
47°2 50°7 
92°9 105°1 
161°2 178°9 
244°0 257°9 
324°1 322°0 
77°5 359°9 
387 °6 359°4 
350°4 327°5 
282°4 275°3 
206°2 215 
140°4 157°7 
93°4 108-9 
63°0 71°4 
43°6 44°7 
29°8 26°4 
19°3 15°0 
11°5 8°5 
6°2 4:5 
| 49 | 3°8 
2922 2918°8 
=18°024 | =29-942 
= "586 = O71t 





Marginal Barometric Table Totals. 


LavupDs iE 
Computed 
Observed | eater | PSE 
| Tetrachoric | Rhodes 
| 
Le 1 gy 
2 || 
14 19°9 15°2 
36 41°4 36°8 
64 74°7 71°6 
141 119°6 1240 
200 173°0 181°3 
263 227°7 236°4 
260°5 274°0 278°6 
277°5 302°9 300°6 
283°5 308°9 302°0 
277°5 291°9 284-0 
245 256°8 251°6 
212 212°6 212°8 
192 167°4 171°6 
135 127°4 132°0 
97°5 94°8 98°8 
67°5 69°8 71:3 
63 50°9 “50°0 
38°5 36°4 34:0 
24°5 25°2 22°6 
1] 16°6 14°7 
7°} 10°4 9°2 
4°5 
25 
OO | \ 194 \ ise 
| | 
1 | 
i 
2922 2922 2918°8 
\—— : 
x*(n'=23)|  =39-086 =24:466 | 
ta | = 087 = °325* 





deal. It is not possible to discuss directly the marginal curve of the Rhodes’ 
surface ; it is not yet expressed in a finite form, but it appears to be an expansion 
in terms of Pearson’s Type III Curve and its differential coefticients, as the tetra- 
choric expansion is of the normal curve and its differential coefticients+. Its fitness 
for describing univariate frequencies might well be discussed. 


* Mr Rhodes gives the values *2 and -4 for P, using 19 groups for Southampton and 21 for Laudale 


(Biometrika, Vol. xv. p. 374). Using his groups I find for Southampton: P=-110 and for Laudale: 
P=°295. 


+ See Romanovsky, Biometrika, Vol. xvi. p. 114 et seq. 
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| The following table indicates the values of the marginal totals as found (i) from 
| the tetrachoric expansion and (ii) from the volumes as deduced by the formulae 


of the Appendix from the ordinates (Table V) together with the distribution made 
| for the marches (see Table V1). 


| Computed 


= 
Fd 


Main 
| | | Surface 
| Sree | 
| | 31°0 76 
| 30°9 3°14 
30°8 8°57 
30°7 21°32 
| 30°6 47°25 
. 4 B80°5 93°17 
® | 30-4 161-21 
| <3 | 30°3 243°44 
| = | go-2 | 342-58 
rr i s0e4 377°62 
4 | 30-0 387-42 
= 29-9 350°16 
| bo | 208 | 276-96 
D | 297 205°73 
| 296 140-08 
2 | 29:5 | 92-89 
| B | 29-4 | 62°54 
=~ | 29°3 13-00 
S| 29-2 29°54 
a | 29:1 19°25 
MO | 29:0 11-41 
28-9 | 6°03 
28-8 | 2-90 
23-7 | : 
| 236); — 
28°5 a 
| 28-4 
28°3 | = 
28-2 | = 
28°] | 
28°) | 
| 
Totals | 2908-97 











TABLE IX. 
SovuTHAMPTON 
Appreciated =p ssiie Tetrachoric — 
Marches Expansion Surfaces | 
“40 “40 as — 
81 1°57 we 
“44 3°58 3°6 2°27 
54 9°11 8°8 7°64 
“41 21°73 21°2 18°81 
60 47°85 47°2 41°14 
“69 93°36 92 9 74°10 
"83 162-04 161-2 118°82 
“DD 243°99 244°0 172°26 
“30 324°58 324°1 226°95 
“30 37792 377°5 273°13 
“45 387°87 387°6 302°43 
75 350°91 | 350°4 307°87 
“40 277°36 282-4 290°60 
14 205°87 206°2 255°72 
“44 140°52 140-4 211°48 
*72 93°61 93°4 166°50 
“88 63°42 63°0 126744 
“56 13°56 43°6 94°41 
*A7 30°01 29°8 69°48 
“05 19°30 19°3 50°59 
“15 11°56 11-5 36°12 
*20 6°23 6-2 24°83 
| -00 2°90 16°23 
1-70 1-70 4-9 10°05 
25 25 «| | 5°80 
- a _- 2°08 
— —_ 1°32 
| _- *57 
| — 22 
| = : — *O8 
| : 03 
a a 7 =, 2 
| 13-03 2922 =| 2922 =| 2908-97 





LavDALE 
Appreciated 
Marches Totals 
58 “DS 
*B5 2°62 
67 §°31 
64 19°45 
55 41°69 
“92 75°02 
70 119°52 
93 173°19 
70 227°65 
“86 273°99 
“DD 302°98 
“DD 308°42 
40 | 291-00 
*25 | 255°97 
“40 | 211°88 
29 | 166°79 
*20 126°64 
“09 94°50 
*20 | 69°68 
“30 | 50°89 
3 | 36°42 
*bO 25°33 
“Dd 16°78 
*30 10°35 
‘20 6°00 
*20 3°28 
“44 1:76 
"24 “81 
13 *B5 
04 “12 
‘00 *03 
13-03 2922 


| Tetrachoric | 


| Expansion 
| 


2922 


| 
| 





I think we must conclude from these results that our method of computing 


i volumes from ordinates is in the main very satisfactory. The small disagreement 
at the tops of the columns is largely due to our not computing and then allowing 


for the negative ordinates of the 15-constant surface, while they appear in the 
tetrachoric marginal totals. 


General Goodness of Fit of the 15-Constant Surface. We may now consider the 


problem of fit apart from the contour lines and the marginal totals, ie. by the 





P, x’ test for goodness of fit. Unluckily our goodness of fit tables (Lables for 
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Statisticians and Biometricians, Table X1I, p. 28) are limited at present to 30 cells*, 
whereas for a thorough test in this matter it would be desirable to work with 80 to 
100 frequency groups at least, but the numerical work thus involved cannot at present 
be faced. Accordingly I was reluctantly compelled to reduce the cell frequencies 
to 30 or less. If this must be done, I concluded that it might be just as well to use 
the 24 groups adopted by Mr Rhodes, as to do so would enable me to compare the 
general efficiency of his skew surface with the present one. Mr Rhodes’ surface 
has nine available constants. One of these goes in determining the volume; two 
others in making his origin the mode. He is thus left with six constants which 
may be determined from o,, @2, 281, yB1; 282, y8s It is clear therefore that qu, 
q» and, I think, r are functions of these marginal constants, or the correlational 
constants even to the second and third orders are not independent of the marginal 
variations+. The 15-constant surface which leaves absolute freedom to these 
constants ought therefore to give a better fit than Mr Rhodes’ surface. 

* This was fully adequate in the case of univariate frequency for which the tables were originally 


computed, 
+ Using our notation for the §’s we have with Mr Rhodes’ symbols 0, ¢, »: 


2 82-3 81-6 _die(p—1)}? 


3,8, anata tea has ca (a), 

2 yB.-3 yB,-6 _d{0(0-1)?2 
“= = ne eee eee ee eee reese reese eeeeesesseeeees l , 
3B, (+d)? 0) 
Ce ek | Te (c). 


These three equations theoretically suffice to determine 0, ¢ and in terms of the marginal variation 
constants, ,8,, 8», 3, and ,f,. 
(0+) 


But PE nes han ne cons nsce vncescinenctsesdoyeeonaesess ves (d), 
w/(@2+) (¢? +)) 

G,=P J Bh + J BF pla 6 le — Oh ec rnviccesicsvececctusonaccassessteeed (e), 

ee ee, en ee ee) (f). 


Thus not only r but g, and q,». are functions of the f’s of the marginal totals, and the same objections 
hold as in the case of the Filon-Isserlis surface. Actually Mr Rhodes pays no attention to fourth 
moments and would pass over Equations (a) and (b) above, deducing his values of 0, @ and X from (c) 
and (ec) and (f) by replacing 2 ,8, -3,,8,-6 in those equations by the values for it in (a) and ()); thus 
indirectly the ,8, and ,8, are involved. I do not know what values should be given to the sign of the 
second radicals in (e) and (/), but if we substitute the values of r and the 8's observed for Southampton 
and Laudale, we find 

2) = *323,018 + 304,071, 
= °018,947 or -627,089, 
either of which is very far from the observed value 286,962. On the other hand 2 ,8,-3 ,8,—6 is negative 
or the value of g,. is imaginary. Hence neither of the conditions connecting q,, and q,. with the f’s of 
the marginal totals (and these might be treated as criteria justifying the use of the surface) is even 
approximately satisfied by the Southampton-Laudale data. If we assume that q,, and qj). have their 
observed values, then we find from the Rhodes’ surface : 
»82=3°261,705 and ,8,=3°350,457 
against the observed values : 
Southampton 3°612,028 and Laudale 3-194,947. 

The divergence is quite sufficient to account for the inferiority in the fit of the marginal totals when 
we compare the P-values with those of curves having both f,’s equalised with the 8,’s of the data (see 
Biometrika, Vol. xtv. p. 374). At the same time the probable errors of these 8,’s are only about half the 
differences between the observed and computed values. One great advantage of Mr Rhodes’ surface lies 
in the relative ease with which the ordinates and the corresponding frequencies can be calculated. 
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The accompanying table provides the frequencies of the 15-constant surface, 
of the Rhodes’ surface and of the data in 24 groups. 


TABLE X. 


Abridged Table showing the Correlated Data for Barometric Heights at 


Height of Barometer at Southampton in Inches. 


Southampton and Laudale in 24 Groups. 





—30°7 


| O. 19°25 | O. 30°25 


30°C T. 19°0 


R. 16°9 





| 306—30°4 30°3—-30°1 | 830°0— 298 | 29°7—29°5 | 2 








oO. 
rT. 
R. 


305 —80°3 


| 
| 30°2—30°0 


29°9—29°7 


293 


29°1 


Height of Barometer at Laudale in Inches. 


29°00 —28°8 


O. = Observed Frequency ; 

















29° 294-292) 29-1— 
nm =a Slee sea ge Pee TT 
| T. 44-8 | 
| R. 366 0. 2185 | O. 17-25 | | 
Peake su T. 189°3|T. 229 | 
1715 R. 170°4 | R. 19°1 | O. 20°25 
162°9 T. 26:9 
191°7 R, 24:1 

O. 432°5 | O. 247-0 | 
T. 450°2 | T. 231-3 | 
118°5 R. 462°8 | R. 223°9 | 
1ILL°8 — — = arg a — 
115°5 O. 240° | O. 422°75 | O. 106-25 
| T. 257°8 | T. 449°9 | T. 116-7 
R. 262-0 | R. 4225 | R. 13071 | O. 645 
———— |__| reo | 
O. 51°5 O. 255°5 | O. 1870 | 2. 58°3 | 
T. 46°9 T. 248°7 | T. 162°0 | | 
R. 42:7 R. 239°5 | R. 188-7 | | 
eee ra = i Re ieee th fens rd | | 
10. 1085 | 0. 450 0. 80 
iT. 1006 | T. 39°7 T. 14-2 
|R. 1070 | R. 495 | R. 92 
O. 72:5 | O. 11°5 
T. 67-2 T. 13-4 
R. 59°2 O. 29°0 | O. 365 | R. 100 
T. 35°3 | T. 37-2 
R. 32°4 R. 38°3) O. 80 
| T. 90 
8-4 


T.=15-Constant Surface Frequency ; 
R. = Rhodes’ Surface Frequency. 

Totals: O.=2922; T.=2921°7; 

for T. = 29°092 ; 

for T.=°1775 ; 


for R. =36°328; 
for R.= -0646*, 


x? (24 cells) : 
F: 


It will be seen that the 15-constant surface has three to four times (if we 
accept Mr Rhodes’ P = 04) the goodness of fit of Rhodes’ surface. I am inclined 
to think the fit would have been substantially better had the blocking up of the 


* Mr Rhodes gives (Biometrika, Vol. xtv. p. 375) the value ‘04 for P. My value somewhat increases 


his, but my arithmetic has not been checked. 
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central cells not been so extensive. In the cell Southampton 29'°7—29''5, 
Laudale 29”"*6—29'"4, there is possibly some slip in either the 15-constant surface 
or the Rhodes’ surface value. I have checked the former without discovering any 
error. 





The agreement of the 15-constant surface with the data is not as good as one 
might have desired, but it is probably all that could have been anticipated having 
regard to the nature of the data. If the 15-constant surface actually describes the 
material sampled, we should expect a worse result than that observed in one out 
of five or six trials. Such a result is sufficiently encouraging to make it worth 
while investigating the 15-constant surface on further material. It is, for reasons 
already stated, not an ideal solution of the bivariate frequency surface, but it is 
better than anything thus far attempted. Its freedom, as far as all momental 
constants up to the fourth order are concerned, renders it much more efficient than 
any surface can be which has arbitrary relations, between the fourth and third 
order momenta! constants. 





(10) On a practically adequate Method of finding the Mode of the 15-Constant 
Surface. I will illustrate this on the present surface for the barometric data. 


After the ordinates have been tabled we pick out the three arrays one way in 








which the mode must lie; in our case we have: 
Southampton. 

| 30:2 301 300 Scheme of Ordinates 

3 a z = 
Ss 302 52°21 - 22, 2 _— | _ 
= 30""1 | 59°84 59°67 22,1 21 - 
~) | 300 55°34 67°98 59°62 - ee eee 
29"9 on 63-02 | 67°44 | woe | 

298 — 62°73 a | — | zs | 

| 

| | 


We now determine the position of the vertex of the parabola passing through 
each triplet of ordinates. If we have three ordinates equally spaced, z_,, 2, 241, 
then the distance 7 of abscissa corresponding to vertex from the foot of central 
ordinate* is, for three ordinates 2,,, 2), 21: 


and the maximum ordinate 


(eg (GH 
L @p-2zuy _V_\ 2 \"" 2 


Z=A4+>5 
8 22,— £4.35 21 Zi + ay 








We now apply these formulae to the above three triplets and determine the 


* In the old days I found these formulae quite useful in determining the position of the meridian and 
the latitude from three circum-meridian observations of the altitudes of the sun with the sextant. 
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mode in each case. Remembering that our unit is #1,” we find for position of 
Laudale modes : 


Southampton. 
30" 2 30"°1 80'°0 
g Laudale Mode: 30”-08710 29”°98745 2988759 
Modal Frequency : 59-94095 68°08571 67°53649 


Now it will be seen that the Laudale modes for the three constant values at 
Southampton are nearly on a straight line; for practical purposes it will suffice to 
take them on a line at distances ‘141245 apart. Accordingly we proceed to find 
the mode of the above three modes in the same manner; and we find it to be at 
a point on the line joining the three above modes between the second and third 
at distance from the second ‘43683 of the interval between them. This gives for 
the position of the maximum ordinate * : 

Southampton: 300563, Laudale: 29'°9459, 
while the modal ordinate is 68°91519. 
Working with the Laudale arrays instead of those for Southampton I found : 
Southampton: 300579, Laudale: 29-9483, 
and the modal ordinate 68°89727. 
The means from both determinations: 
Southampton: 300571, Laudale: 299461, 


and the modal ordinate 68°9062, will I believe be very close to the true values for 
the 15-constant surface. 


(11) Conclusions. While I believe expansions in tetrachoric functions, whether 
univariate or bivariate, have neither from the standpoint of the theory of probability 
nor from the theory of functions any demonstrable validity—for in the case of 
frequency expansions they do not usually converge—they have some practical value 
depending upon the fact that any two expressions for which some of the momental 
constants are equal, will show more or less agreement between their values, 
according to the number of momental constants equalised. Yet if two systems 
be taken to represent a frequency distribution by equalising the same number of 
momental constants, one may be much better than the other—the one that is 
ultimately based on a converging series will give the better result when a proper 
goodness of fit test is applied. Many curves can be propounded which will within 
a given range give four or more momental constants equal to those of a given 
frequency distribution, but I have not found any yet which on the whole approach 
frequency distributions with the same good accuracy of fit as the skew curves 
I have proposed+. They certainly give far better results than tetrachoric ex- 
pansions taken up to the same number of equalised moment coefficients. I feel 





* Mr Rhodes gives Southampton 30’-10845, Laudale 29’-99315 and modal ordinate 64-87113. 

t Some of the comparisons made between my skew curves and tetrachoric expansions are not worth 
discussing ; they consist in splitting the frequency up into five or six groups and making no real test 
of the tails at all, A parabola of the fifth or sixth order could be made to fit such groupings even more 
snecessfully ! 
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quite confident that if a surface could be determined of the same character as the 
curves with the same number of free momental constants, it would be far more 
satisfactory than an expansion in bivariate tetrachoric functions. But after many 
years of investigation this has not been achieved; and for the present we are thrown 
back on expansions by bivariate tetrachoric functions. The present paper shows 
that : . 


(a) The 15-constant tetrachoric expansion gives as judged by the contour lines 
an excellent expression for a mathematical bivariate distribution such as the 
whist double hypergeometrical series. 


(b) For an actual series of observations it appears less successful, but this 
series is a case wherein there is some suspicion of heterogeneity, and after all its 
goodness of fit as measured by P= ‘18 means that one out of every five to six such 
series would give a worse result. This is about three times as good as the 
Rhodes’ surface result. 


It appears to me therefore that until something much better is forthcoming 
the tetrachoric bivariate expansion must be used. 


It will certainly give much worse results than have been obtained in this 
paper, if we use it equating only the volume W and the constants %, ¥, o,, o», 
1, Bi, x81, Gig and q.. As a univariate distribution requires for reasonable fit all 
the momental constants up to the fourth, so a bivariate distribution requires also 
all the momental constants up to the fourth, i.e. we require a 15-constant surface. 


But the present work has demonstrated how great is the labour involved. It 
is not only the computing of all the moments up to the fourth of the data and 
correcting them for grouping; this is arduous enough. But when the numerical 
form of the surface has been obtained, it is practically of no value until: (a) the 
ordinates have been computed, and (6) from these ordinates the theoretical cell 
contents have been deduced. 


After the experience we have had with this paper we can hardly recommend 
the process to statisticians, the labour is far too strenuous. But taking this paper 
to show that the contour-system of the 15-constant surface is remarkably flexible, 
and capable of giving something approximating to a fit, if not of the best, is there 
any way in which help can be given to the statistician ?—Clearly there is. He 
could be largely aided by a computation of the bivariate tetrachoric functions 
exhibited in Equation (xlv). There are ten functions involved here and if we 





extract the factor 2 they depend solely on gi ==, y= and r. These 
Go» a; oe 
functions are 
3% _& de a 
nen da'*dy’’ da’ dy’ dy"? 
dz d*z dz dtz dz 


and 


da’*’ da'*dy’’ da®dy?’ da dy*’ dy’ 
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Probably a range of « from 0 to + 40,* by ;5, positive and negative, with a like 
range for y, and one for r from — 1 to +1 by 05 would be adequate, and clearly only 
six functions need be computed. This meansa table of 6 x (41)*, or 413,526, entries, 
or supposing a page of 7 columns of 60 entries each, a volume of 985 pages. 
The computation would not be hard if 2 were once calculated, especially if it 
could be cooperatively organised. But would there be any chance of finding the 
funds needful for publication ?—I fear not, until there is greater recognition of 
the importance of statistical research. It is clear that the tabulation of Z in 
itself would be of much value. At first sight it might appear as if volumes in the 
« and y argument ranges would be more valuable than the ordinates. But on 
consideration it will be seen that these are unlikely to coincide with the sub- 
ranges of the actual data. I believe it would be best to table ordinates, to work 
from Equation (xlv) to the ordinates of the 15-constant surface at the required 
points, and then by some formulae similar to those of my Appendix, deduce the 
cell frequencies. While the computation might, as I have said, be possible by 
cooperation, it is the cost of publication which will block the way. Nevertheless, 
I cannot believe, that except in very special inquiries, the isolated statistician 
will feel able to give the labour and the time needful to fit a 15-constant surface 
to a skew-correlation table until tables of the bivariate tetrachoric functions are 
available. That is, I think, where we must leave the matter for the present. 


While the general theory of the 15-constant surface was given by me in the 
lectures of last session, 1 have to express my obligation to Mr Wishart for checking 
my algebra; to Miss E. M. Elderton and Miss M. Moul I owe a deep debi for their 
services in calculating ordinates and cell frequencies, while Miss McLearn has 
struggled most valiantly in the production of the contours, those for the observed 
barometric data being of a most difficult, trying and uncertain character, which 
only her skill and patience have to some extent surmounted, 


APPENDIX. 

On Methods of proceeding from Cell Frequencies to Ordinates, or from Ordinates 
to Cell Frequencies, in the case of a bivariate Frequency Surface. 

The Lagrangian interpolation formula which corresponds to the midpanel 
central difference formula up to and including second order differences takes the 
following form +: 

2 = Z,9(1—a*)(1—y’) — dy (1 — y) (1 — @*) 2-1 + dy + y) 1 — 2%) 20,1 
—tar(l—a)l—-y)24 5+ }e14+2)(1—-y) 210 
+ jay (1+ 0)(1 + y) art dey 2) (Vy) 2a, 
—fay(1l+a2)(1—y)4,4 -foyQ-2)1+y) 2. 


* Ina Table of three or four thousand entries we shall certainly need to go as far as 4c,, possibly 
further. 
+ K. Pearson, Tracts for Computers, No. III, p. 26. Cambridge University Press, 
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This surface passes through the tops of all nine ordinates of the cell scheme: 














—¥ =F 
z 4 7 — a ees 
| SGo% %,-1 | 4,-1 | | f-1—1 So, -1 Si,-1 
Rae Bs ES Bas RIS, Bs. MO Beets 
: 
=~ -1,0 O. Qrttes eeeee ZL Qeeees teres +7 —A Fant Qeeee --eeSoo evceclococe hi (eres eee +2 
| o-11 | Zo, | 41 | Faas Sor Aa 
+y +y 
Central cell ordinates. Cell frequencies, 


Corresponding to the cell ordinates z, we have the nine cell frequencies f. If 
we note the following integrals: 


+4 +4 +4 +4 
| (L=2')de =| (l-y)dy=H3; | a(1 - )de={ y(—y)dy=—- 4; 
=f a H -} 


+} +h +4 +3 
| w(1 +2)de=| y(l+y) dy = 7; | (1-2) dy=[ (l-y)dy=— 4; 
=4 =¥ +4 +} 


+4 +9 +} +t 
| y(l—«)de=| y(+ty)dy=— 4; | w(1+2)dx =| y(it+y)dy= 2; 
J +4 J 44 +4 J+4 
we can readily integrate for the volume of every cell content, and find at once : 


Sos =sh {4842,, 4 + 22 (20,1 + 2-1 $2,0+ 21,0) + 47+ 4,1 F 23,2 F 21,1} -+-(@), 
Aa = sty {425 4 — 502), — 22,4 — 502;,0 — 2249+ 6252, + 2521, 1 + 252241424, 1} 


, E(\- a+ 9~ R()- OI P y 5 } 

Fans = ate [40,0 — 502, , — 229, -. — 221,, — 502_,, 9 + 252). + %,-1 + 6252_,,, + 252_,,_,} 

y ~ © *~ © » o Die» oa ts Bx ) 

Sanaa = athe [420.0 — 220.1 — 502, — 22), — 5024.9 + 21,1 + 252,,_, + 252_,,, + 625z_,, -1} 
on f Ae 4-4 99- oe . QR- > Bnd » 

Sar — ate a 442, + 550z,,, + 222. —j~ 22 o — Pes + 252,,1 + 2-1 +252.41+ 24-1} 


for = aha [— 4420.0 + 2220, 1 + 55029, — 224, 9 — 2240 + 24,1 + 252, 1 +24,1+252_,,-4} 


Rppeer |) s 


P AA» 9+ ¢ Ke 9- 5) - oko a = ) 
Jio = ate i— 4425 = 22,1 - 22,1 + 5502z,, + 222_1 0 + 252; + 252), a+2-114+2-1,-1} 


Fano = sha im 4420.0 — 220, 1 — 22, 1 + 222, 6 + 5502, 6 + 24,1 + 2, + 252_,,, +252 


) 
—1,—1f 
The first of these results, i.e. that for f,,,, gives the frequency in the central cell 
of a group of nine ordinates, 3 x 3, in the scheme. The other expressions may 


be useful, where it is impossible to use a central cell, for example, towards the 
boundary in crateroid surfaces, 
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Let us put Neo = £01 + Se, -1 + S,0 + 2-1,0 
and M1 =41,¢ 4,44 2414+ Zi; 
further we may write 
Bi, =fir +fia + fan + fs, and Bor = fo, + fra + fio + favo 


The first of the above equations gives us: 


STBL,, = Glin, g + BE Rg 4 Ma 2. ec ccesconesecsesseeet (x). 
The sum of the next four provides : 
BVGM, = 1x, — TOGA, + OFGR,,  .-nccieecscesioteeess (r), 
while the sum of the last four leads to 
576Bo,1 = — 1762.5 + 5O8No, 1 + 52) wee e eee eee eee eee (p). 
Multiply («) by 52 and subtract (x) from it, we find: 
29952f,,, — 576By,, = 2534425, 9 + 5761 .eceesseceseeeees (v). 
Multiply (~) by 13 and subtract (A) from it, we have: 
7488B,,, — 576B,,, = — 23042.,o + T488No,) ..-.eeecceeseeees (€). 
Multiply (v) by 13 and subtract (&) from it, there results : 
389376 fo,o — 149768,,, + 5768,, = 3317762) 9 ....ee eee eeeee (0). 


Substituting for 8,,, and 8,,, we obtain after dividing by the coefficient of z,, 4: 
24,0= ste {676f,,0— 26 (firth, -1 +f, ot f- 1, o) +A, itf,, —1 +f, its. -1,-1} (77). 

This is the desired expression for the central ordinate in terms of the frequencies 

of nine adjacent cells. We suppose all cell bases equal ; if they are not 1 x 1 squares, 


but rectangles a x b, then the coefficient =}, must be replaced by sesh . 
We deduce from («) and (X) 
No,1 = whe (208fo,0 + 568Bo,, — 448i) ....cceeseeeeeseeeees (p), 
Mar = whe (16fo,0 + 8880.1 + 484811)... seeeeceeeeeeeees (c). 


By aid of these equations, I have solved the remaining equations for the 
ordinates in terms of the volumes. These are of value in finding ordinates at the 
boundaries of correlation tables. They are: 

2,1 =st {52fo,0 + 598 fy, i— 26f,, -1 — 2( fh, ot fur, oth, i+ fui, :— 23( fu, at fi, -)}, 
A,0-% ste {52f,, o+ 5f Sf, = 267_,, o— 2( Ie, i+ Io, -1) +fi, as SA, -1 — 23 ( j-, ail +f, i}; 
2o,-—1>= =sty {52f,, ot 598f,, <3 — 26f,,; —2 (fot f-1,0) +f, + fia — 23 (fur + fir; 
2,0 = ste (52fo0+ 59860 — 260-2 (fi, + fo) + fay t+ fa, a — 23 ( fia +f}, 


and : 


at a (4fo, ot 276f,, i 254f_,, —1 — 23( A, atfau, 1)+46(f,, its, o- 2(fa, atfur, of 
2, = why [4foo+ 276f, 1 + 2547, -—23( fi, t+ faa) + 46 (fio t fo, -2( Payot Sav}, 
24,1 % ate {4/,, ot 276f_ 1, 1+254f,, ~— 23( | atf, )+46(f, tfo, i)- 2( fo, atf, ots 

ata (4foo+ 27 66a, 1 + 254f, 1-23 (fi, at far) +46(fo, atfao)-2Hyotfuv}s 


Zi 


241=5 


The above are useful, but of course only approximative, methods of passing from 
ordinates to volumes or from volumes to ordinates in the case of frequency surfaces. 
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ON A SKEW CORRELATION SURFACE. 
By E. C. RHODES, M.Sc.* 


Proressor EpGEWorRTH has considered as a skew correlation surface of interest 
the surface 
2=4—} (qu40t 3qn Zor +3¢q24h2 + qos Zox)s 


1 
1 —$, a (2*- Srey +y?) 


== ______- ¢ 
QarV1—r? : 











7, 


where 


expressing the variables z, y in terms of their standard deviations, where 


qimtn Z 


2 
Lam Nea da™ dy" ? 


and where the q’s are other constants involved in the surface, which can be shown 
to be moment coefficients. 
From the equation 


9 
log Z = — $0 (a? — 2ray + y*) — log (=) ’ 
VC 





writing C= =~ 
we get 
4Zy = -— CUZ (a — ry), Zn = — CZ (y— 12), 
Ly = — CL, (a — ry) — CZ Zn = — CZ, (y — ra) — CZ 
=—-0Z7+CZ(«—-ry), =-—O0Z74 CZ (y—rey, 
Zin = 3O0?Z (a —ry)—-OZ(a—ryy, Zy,=30°Z (y —rx)— OZ (y — rey, 
4, = — OZ, (a—ry) + CZ =rCZ + C77 (a —ry)(y — re), 
Zn = OZ (y — rx) — 2rO*Z (a — ry) — CZ (a —ryy (y—re), 
Zy.= OZ (a@ — ry) — WwO?Z (y — rv) — OZ (y — reyY (a -— ry). 
1. Moments of the surface. We will assume that « and y may vary from 


— 2 to +o, and all integrations will be taken between these limits. See Note, 
p. 326, 


(i) We have: 
| ; | zdady = [ | [Z —} (Gao4s0 + 34 + 3q 22 + Yos4es) | da dy. 


* Received June 4, 1925, 
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Now | | Zdady =1. 


cs r Zydudy =(_ ([Z]* ,, dy =0, 


[| 4adedy =[41"., =0, 
because Z vanishes for extreme # and y, and similarly for the other integrals. 


Consequently the volume of the surface is unity. 


D 


(ii) Consider | | zadady. We know [ [ Zidady =9. 


—D 
| | S..wiely= | (2.7 ode <0 
for the same reasons as before. So with the other integrals. So also with 
- 2 D 
| | zydady. 


Consequently the mean of the surface is the origin. 





Zovdedy=-| |  Zadedy=-[ [%]*, dy=0, 


—-@ 


(iii) Consider | | za’dady. We know | | Zu dudy = 1. 


[ [ Znytrdedy = — | | Zi 2x dady = 2/ [ Z,,dxdy 


¥ =-@J -@ 


=2/  [Z1°.dy=0, 


| | Ze%dedy = | [Zn]- , 2 da = 0. 


a 


Similarly with the other integrals. So the standard deviation of the «-variable 


is unity. Similarly with the y-variable. 


(iv) Consider | | zaydady. We know [ Zuydaedy =r, 


—n/ =D 
H 


| ia Z,,cydady = — | | | Zgy dedy = —| [Z| ydy=9, 


| [. Za vydirdy = — | [- Zydeody={ [ Z,dudy 


ol | [Z]°,, dy =0. 


Similarly with the other integrals. 


Yonsequently 7 is the correlation coefficient between the variables. 
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(v) Consider [_ | za'dady. We know [ | Zx° dady =0. 


[ | | 42° dady = -| | 430 dady = [ i 4, 6adady 


=- [ 6Zdady = — 6. 
As before the other integrals are zero. 


Consequently the third moment coefficient of the #-variable is qj. 


We find in exactly the same way that q,; is the third moment coefficient of the 
y-variable, and that g.., G2 are the product moment coefficients obtained from 
Xa*y, Yay* respectively. As mentioned above the q’s are the higher moment 
coefficients of the surface. 


2. The array. 


(i) Consider | zd, keeping y constant. Call this z,; the sum of the array. 
- 1 e 

We know | Lda = —e~ ™” = (say). 
-x V Qa 


| Zy da =[Zy)2,, = 9, so also with Z,, Z., 


= df” d3 a 

dae = — Lda = — —- iv’) J, (Say ). 

| x Zyda zl. Zda dy f= v; (say) 

Thus Zy =V— 440303 = , Ww 1-&(y-¥) 

s y 6 Jos Us Vor 2 y 3 ° 

Similarly s. =| zda=uU— kts, 
it . oe Sy 
where u=——e da and w= ad 
\ Qer da* 


Thus the marginal curves of the surface are of the type developed by Edgeworth 
and others. 


(ii) Consider | zed, keeping y constant. Call this z,%,. 


ra 


We know | Zudau =vry. 
yy 7 . 
! Zyuda =|Zyx}" , -{ Z,,da = 0, 


| Z,0du =([Z,x]" , — Z,,dx =0, 


wo 


| Zy ada =(Zyx}” , -| | 
[° 4ande= 2° 
-@ d 


of « 
Zoda asians al P Zdx=— Ve (say), 


4 Gade = ipern= wry + B02”. 
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So we have Zyky = VTY + $qr2l2— § Gos (UsTy + 30.7) 
=ry (v— $ Gos U3) + $e (iz — 740s), 

Us 

therefore Te ee ae ee ee 

or By = Ty + & (G2 — 1Gos) oar 


1-y¥ 
ce ms y ? 
1 2 Qos (y 3 ) 


which gives the equation to a regression line. The other is similarly given by 





= ry —$ (Ge —1Gs) 


1-2? 


a\" 
1 — $qs0 (e- 3) 


It will be seen that the regression lines, in general, are not straight, and the 





Yo = TH — 4 (qar = rq) 


conditions of rectilinearity are gj. =7q; and ga =1qs. These conditions have been 
obtained before and are well known. 


(iii) Consider | za*dax, keeping y constant. Call this z,(o,? + %,’). 
We know | Za da =v [1 —71?+r7y"]. 


| ee | Etoile | Zy2dx =0, 


- -@ 


| Z,0da = -| Z,,2ada = | 2, 2da = 2 ie (v) = 2u,, 


“4 i er ee, ae WP: 
[  Zomde=— | Za2ede=— 2a] Zade=—2 jp (0ry) 


as n2 2 = a . 2 > d* a2 2 9,2 
[4a de= Fa) _ Ze lea a la +ry*)). 


So we have 
Zy (oy + &7) =v (1-9? 4 ry’) — £ Bau X 20, — 3qiz X 2 (ry + 20,7) 
+ Gos [03 (1 — 7 + ry") + Bu, x 2r?y + Bu, x 2r*]} 
=v(l—9r + Py’) — qu + TH (My + 20) 
me = [v, (1 — p+ ry?) 4. Gv ry + 6v,7"] 


=(v—j4 oss) (k= 9* + ry’) = (Ga = 2rqie + 1 qos) 
+ TY (G2 — 740s), 
therefore 
(Gen — 2rqis + 1° qs) VY + v.TYy (Ge = os) 
U— § Goss Y= Fess 


oy + 8,2 = l-r+ry— , 
V2rY (qe = Qos) + } (qe a 7os) Ve 


But By? = ry? + 
i —— ) pee | o.\2 ? 
U 4 Qos Us (uv 6 Qos Us) 
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(qa — 27iz + 1° Gos) Mi 1 (he — Tos) V2 
-} = 
os 4 Jo03U3 (v— $ Gus)? 


(qa ve 2rqis + 7° qos) Y a8 (qi = Qos)" el — yy 


ap 4 apy |2? 
1 — 34 (y- r) [1 446 (v-4)| 


which gives the standard deviation of the array. 


therefore oy=1l—-r—- 


=1l-7r+ 





From this last we may obtain the correlation ratio, thus: 
Dn eo wo 
1-7/= [ of2,dy =(1 — )| Zydy + (dar — 2rG2 + qu) { v,dy 


ao 
—4(qe- ros) | 
os ve 
»refore 2— 72 1 mw ,)? Bhs. ly. 
therefore Ny? = 7? +4 (Gie— 140s) aoa 
[ny is the correlation ratio obtained oy considering the standard deviation of 
y-arrays. | 


The integral in this expression simplifies when q;= 0. 


Vv." 


dy, 


o¥— $qoss 


Then we have | (l-y’? = ev’ dy =2. 
sli = 


Thus we have as a simple case, when the marginal curve is a normal curve, 
ny=r+ $q12°. 
In the extreme case when q, and gy = 0, and both marginal curves are Gaussian, 
we have the regression lines 
Ly =ry — $92 (1—-y), 
Yo = 14 — $n, (1 — 2"), 
i.e. parabolas, and ny = 17° + $4427, 
Ne =T + 4 a. 
The standard deviations of the arrays are given by 
oy = 1-1 + (qu — 27g) y-—tge (l- yy, 
og =1—1 + (Ge — 2rqu) e — $qu° (1 — ay, 


This is a case of a skew surface with normal curves for the margins and 
parabolic regression. 


3. Let us consider the problem of finding the relationship between two variates 
when the individuals are arranged in order of merit, the problem of ranks and 
grades. Professor Karl Pearson has developed the method in Drapers’ Company 
Research Memoir, Biometric Series, No. IV. 

The essential parts of the argument only will be reproduced here. WN individuals 
are distributed in a frequency group according to the extent to which they 
possess a variable quantity 2, Let the frequency curve be z= Nf(«), which 
ranges from a=—a, tox=a,. Then the grade of the individual of character « 
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is g(a) = if Nf («)dz, and ~ =—Nf(«x). Mean grade is - 3 = 9. of = 75 gives the 
standard deviation of the grades. 

Now suppose that two correlated variables are considered, and let the surface 
be z= NF (a,y); the marginal curves being z= V f(x), z= Nd(y). To get the 


correlation between grades we require the integral 


1=|[G.-9N(g.- 9) NF (wy) deedy, 


the integration being taken over the whole range of , y, where g,, g. represent 
the grades of the individual which possesses the two characters to the amount 
&, y respectively, when the individuals are arranged in order first with respect to 
the one character x, then with respect to the other character y. We have 

oH: — Nf (), ie —N oy). 

Differentiating I with regard to r, 


dl dF 
Te = || — Ig. 9.) NG, dary. 


Now with certain surfaces e = = ; 
dr dady 


assuming that our surface is such an one, 
ll ff . 2 Fr 
= - {| (91 — h)(G2— G92) N dady dady 


= [[ur eo Ede dy = N° Here ) 6 (y) dedy. 


Now we will show that the mn considered in the first part of the paper 
dF 


dF _ : ‘ ‘ 
satisfies this condition — We are still supposing # and y are measured in 


dr dady’ 


terms of their standard deviations. 
The surface we are considering is 


z=Z- t (40420 + 3qa 4x, + 3q242 — qos es)» 


And since = = = a 
i d (d"*Z a qmututsZ 
aan dr aaa)  daetidyet” 
dz dz 


so for our surface 


dr dady’ 

Consequently we can use the method of Professor Pearson, when we assume the 
frequency surface to be that considered here, instead of the normal surface which 
he considers. So we shall write 


_ m| [Z-] 6 (qs Zs0 + 32 n Zo = x 3912 Z 12 <7 Tos Des )] (u a § aos sv ing $ Joss) dady, 


d 
d 














320 On a Skew Correlation Surface 


using the notation already adopted for the marginal curves. The expression to be 
integrated is 
Zn —), (qu tdZ yy + 3qa UZ + 3qr2 UZ + Qos¥OZ us, + Ys0l3¥Z + Gos UVyZ) 
+ aly [Ga02Us0Z0 + Gos? UUs Zos + Ys0 Jos (Uv;Z 5) + Us0Z 3 + Us¥3Z) 
+ Bqin Goss Zar + BGr2Qes UUs Zz + 3G Qools¥Zar + 3q2 Yao U;0Z 2) 





_ i & Ys0JossUs (qs04s0 + 3qn Zn + 3g2Z 2 + Jos Zs) oc ccccccccccecccccees (1). 
- VC —~hfa2+ 42+ C (22+y?2-2 
Now Zw = np? bit 9+ 0 (st Pal 
is vO Pi 4(C+1) [ +y?-2 a i xy | 
(27)? : 
which we can write 
1 a2 2 R 
VGevi-RB tiem (+B? 2) 
2a “Qa VI- Re. s? : 
ae 1 R cd 
where (1- R)s'= C+1 and #(1—R) =r, 
es: TS ar 
Thus aaa eS a 
,.2-" ,. (1 -—1°)(4-7°) 
and fe os 1-kR= (2— ry 
1 a ae =) 
L I -3 +2 -25 
Thus Zw = ; e 1-# (3 # x Le ee 2), 
. Qa V4— 72° Qs? V1 — Re (2) 


and as R varies between +1, when r varies between + 1, it will be seen that the 


equation z= Zuv is that of a normal frequency surface, the total volume (V) of 


which is -, the correlation coefticient being R and the standard deviations 


Qe V4 — 1" 
of the marginal curves both s. 

The integration of the various terms of the expression (1) will reduce to the 
finding of various product moment coefficients of the surface (2). 


Consider first the terms only involving the q’s to the first order 


J w Laydardy = | u Lx) vdy -|| u,vZ,dady, 
remembering that w is a function of # only, v a function of y only. This is equal to 
= JZ) ody + | furZudedy = |[uZ]ody -|j u,vZdady, 


where the terms in brackets [] at each stage are zero, owing to the exponential 
terms in the integrand. 


Now u, =u (3a —*), and the final integral is zero, because it is equivalent to 


the finding of the first and third moment coefficients of the margin of a normal 
surface, 
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Uy, Us Us lo 


Remembering that ww? yo ate equal to odd functions of #, that = ‘a 


Vs VU U4 


2 cae “lah 
are equal to even functions of #; similarly with =e ae iis , ---» and that 


[[- v Z,dady=— Juntdedy, 


[fv v Z,,dady = - {fu v,Zdady, 


[ [use Zodedy =— [[uso.4dedy, 


[[userZadedy =— [fun Zaedy, 
and so on; also remembering that all product moments of a normal surface of the 
tYPe Pok+1, 91, Pok, a4 AVE ZerO, €.g. 
Pro = Pa = Pso = Pos = Pu = Piz = Pre = Pos = Pas = Pss = ete. 
are all zero for a normal surface, we shall find it possible to evaluate the ex- 
pression (1). 

We realise that the integrals arising from the terms involving the q’s in the 
first degree, and in the third degree in the expression (1), are all equivalent to 
moments of a normal surface of the form pogs;,o7 OF Pox, and are zero, The only 
part of the expression (1) which contributes anything after Zuv is the term 
with coefficient 4, involving the .,’s to second order products. 

Concentrating now on these we have to write down the product moments of 
the normal surface, whose volume is V, correlation coefficient R and standard 
deviation of each margin s. Calling 


| Zuva™y" da dy = Vpm - gmtn 


we have pw =Pu=3, Pwo = Po = 15, Pu= Ps =3R, pr=1+ 2K, px =pPys = 15K, 
Pie = Pu = 3 + 12R?, ps =9R + GR. 


[fusvdadedy = ~ | fuoededy = — V (158° — 45s* + 45s? — 15), 
since Ug = u(a® — 15a + 45a — 15). 
| wo, Zed = - | fun %dedy =— V (15s — 45s* + 45s" — 15). 
In the same way: 
| [ues Zn dady = — {fu v;Z,dady = | | u;vZ,,dady 
= = | fuvz (3a — «*)(3y — y*) dxdy, since u; = u (8a — 2"), 
=— V[9Rs? —18Rs' + (9R + 6K?) 8°), 
[Juesdadedy = ~ [Jus Zdedy = - [fuze —1)(3— 6y? + y') dady, 


since vs = 0 (3 — 6y* + y’). 
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This integral =+ V[3—9s° + 3s + 6 (1 + 2R?) st — (3 + 12 R2) 8°}. 
| [ues Zsdedy =— [un Zdedy = ~ [fuvde (sy — 10y° + y°) dxdy, 
since Vs = — v(15y — 10y* + y’). 
This integral = — V[15Rs* — 30Rs‘ + 15.Rs"). 


|| U;vZ, da dy = -|| u;v,4dedy = — V[15Rs* — 30 Rs! + 15 Rs*], 


[[usozededy = - ff v,Zdady = same as [fun ta dedy above. 


So we get 


I 
x - = VUL+ gle (Gs0" + Gos")(15 — 45s* + 45s* — 158°) 
“ nee (9Rs? —18Rs'+(9R + GR?) s°) 


+ ay (Yar Gos + Yr2o0)(3 — 98° + (9 + 12R*) st —(3 + 12R?2) 8°) 
— dy (que Gos + Yn Ym) (15 Rs? — 30Rs* + 15Rs")], 
Examining each part of this in turn 
120 
=r" 
BRst (3 — 6s! + Bot + 2Rst) = BRe"[B (1 — st) + Wot] 
sr |, + ” 6r (6 + 72 
alt’ WR 3 : G—ryt 2 (4— =| ay eee ; 
3 (1 — 3s? + 3st — 8° + 4h? s* (1 — s*)) =3[(1 —s*) + 4R?s* (1 — 5°)] 


8 Sr? 2 
=3|) Pe |-4a* 


15 (1 — 38° + 3s*— s)=15(1-s*¥ = 








(4 a ry (4—r°) - (4— Py ° 
n 60r ‘ 2 r 
(1 — Os \ ee .— a ss 
15 Rs? (1 — 2s? + s*) (4—ny? for 1 —s yore Rs — 


Thus 


1 dl =Vli+ 10 (qs0 a2 Jos") _ J20 Jos r(6 + r*) 
Ndr 3(4-r°9 6 (4-7) 


7 


(4 2 = 


2 


1+7 
+ 2 (Gar Gos + 2 Js0) (4—73)' 5 (qi2 Gos + Yer Yo) 


Ra Hill eh ol 
1+ ts | ey) 


a vi 
aw 1 , mt Aor (4— 7? —10)—2, (4-7? - 5) — mr) 


(4— 7°) 


Zz A, + 5A, Avs Nor (10A, + V4) r 
=Vi\1+ (4 —ry (4 —ry bs (4 = ry ~ (4 ate 72) | 


So 
eo a dr | dX, + 5A, rs Mer (10A, + Ax) "] 
aT 37> 
Ne V4—7° 


(4 — ry a? (4- ry? + (4—- r?)? ‘ae ~ (4—r7 
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Now if r=2sin 8, 





dr 2cosOd@_ 1 [ 1 
la ~ ry cos’ a: [sect 0d0= gs (tan 6 + § tan’ 0), 
dr 


4— = 5 1 tan 6+ 2 tan® 0+ + tan? 6), 
_-T 2 


| rdr We | 

(4—r)8 =4(4—1%)- 3 =A sec 0, 
rdr dl 

(4 — ri =5(4—1°)” # = qhy sec’ 8. 


Therefore 


QarI x 
| aT + A (a constant) = @ + Mat oh (tan 6 + 3 tan? 6 + 1 tan® 6) 





* (ban 6+ 4 tan* 0) 


(10As + Ax) 
—2 secs 
* psec 6-~—t60 


=0+™ (tan +3 tan? 6 +4 tan’ @)+ ze sec’ 6 — 3 sec’ 0) 


6 





4s a. 
+ G4 tan 6 sect 0 iy see 6. 
Now if p is the correlation between grades, p = —~.. 
ib 
Thus _s and when there is no correlation, p, 7, @ vanish together, 
accordingly 
ae ee 
ie 160 
Hence 


< = 8 + Bs (Gs? + Gos?) (tan 6 + 2 tan® 6+ 4 tan’ A) 


+ ie (1 + 2 sec? 8 — 3 sec’ 0) 


+ inde Tite tan @ sect @ — hath artes ene (3). 


We may note that if qs) =q@;=0, we get the same result as in the case of the 


normal surface, += 9, 


Thus r=2sin " , In the case of a skew surface the margins of which are 
normal curves, and where the regression lines are parabolas. It must be remem- 
bered that the true measure of the relationship in this case between the variables 


is given by ». 

















On a Skew Correlation Surface 

If we suppose that the regression is nearly linear both ways we shall have 
Gu = 79s approximately and ¢,.=7q; approximately. If we substitute in (8) for 
these values we shall get 


- = O + Fs (so? + Gos®) (tan 6 + } tan’ @ + 4 tan’ @) 


+ ae + 2 sec* 0 — 3 sec’ 0) 
+ 104% tan? @ sec: 9 — 4. re 1 sin @ (sec’ @ — 1) 
8 16 
Yeo" + Jos" B ts 0 3 and ) sec? 3] 
=0+ 96 [5 tan 6 + 4° tan® @ + tan’ @ — 6 tan 6 sec’ @ + 6 sin 6] 


Ys0 Qos 


+ 988 [1 + 2 sec’ @ — 3 sec’ 6 + 36 tan? @ sec’ 0] 
Ys0° + Jos” 2@e e » o . 
0+ 96 [— tan 0 — 26 tan® @ — 5 tan’ 0+ 6 sin @} 
RP TaS cert 0 SE wee 04 BY... --sccseccnecensinriescaneensd 4). 
+ 388 [33 seo? 0 — 34 sec? 04 1] «0... cece ee eeee (4) 


Let us consider a few values of r, with the corresponding @’s; if we denote by 
L the coefficient of qx? + qu, and by M the coefficient of gogo, we have the 
following table : 


TABLE LI. 
r L M 
| 
+1 |; +°0046 0569 
+9 | +:0095 0394 
+8 | +°0121 | -0273 
+°7 | +:°0129 ‘0187 
+°6 +°0125 0125 
| +°5 | +°0113 | -0081 
| +4 | £-0096 | -0049 
+°3 +0075 0026 
+°2 +0051 “0011 | 
| +l +:0026 | -0003 | 
0 0000 | 0000 


Now in the method developed by Professor Pearson, enabling us to obtain the 
correlation coefticient from the correlation coefficient between ranks, which is used 
T 

Hs , on the 
assumption of normal correlation. Let us consider how much difference this 
assumption of normal correlation makes when the correlation surface is really skew. 
Equation (4) giving 


instead of the correlation coefficient between grades, we have r = 2 sin 


Tp 


i 8 + (as® + Ges") L + GarQesM ......cccreseecccsoeees (5), 
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taken in conjunction with the preceding table, shows that ~P > @ when Yso ANA Qog 


6 
are both positive, if » and therefore @ are positive. - is greater numerically than 
@ when qx» and qs are of opposite signs, if 7 and therefore @ are both negative. 


If r is positive and qs) and q, are of opposite signs, 7 for certain values of q,, and 


Ys may be negative. The following table shows certain values of 7 calculated by 
the usual method when the surface is really skew, for gy) = qo:- 
TABLE II. 
Values of r calculated by Method of Ranks and Grades. 


Values of qo = qos- 


Or? | 0-3 Or4 Od | OG ov | Os 09 | 























O-1 | 

| | | 
s | 10 | 1:001 | 1-005 | 1-010 | 1-018 | 1°029 | 1-041 | 1°055 | 1°072 | 1-091 
9 | 901 | -905| -910| -917| 926 | -937 | -951 | -966| 984 
o *8 | -801 | -804 | -809 | 815 | 823 | -834 | “846 | 860 | -876 
= ‘7 | ‘701 | °703| -707| -713| -720| ‘730| -741 | -753| -767 
g 6 601 | -603 | -606 | “611 | +617 | -625 | -634 | "645 | -658 
= "5 | 7500} -502 | 505 | -510| -515 | -521 | “529 | 538 | “548 
d 4 | *400 | +402 | -404/ -408 | -412| +417 | -423 | -430| -438 
= ‘3 | :300| -301 | ‘303 | -306| -309| -313 | °317| °322| -328 
‘2 | -200] 201 | -202| -204| +206} -208| -211| -214| -218 
1 | 100 | “101 | 101 | -102 103 | 104 | +105 | -107 | -109 

' | 








For instance suppose the two variables are distribute1 according to a skew 
distribution where gj) and q; are 0°5 and the real coefficient of correlation between 
them is 0°6. Then @ is obtained as the angle whose sine is 0°3. 


The equation (5) above shows that the correlation by ranks would be given by 


gd ee +f L M 
g = sin 03+ 5+ 7- 


Now if the correlation coefficient were calculated from this p in the usual 
- ° ° ° 

manner we should find 2 sin f which would be in this case 617. Thus the error 
made in assuming normal correlation where the correlation is skew is in this case 
in excess by about 3°/,. A glance at Table II shows that for the cases considered 
the errors (all in excess) are never more than 10°/,. Equation (5) shows that for 
™p 
6 
(or qso = Yos), 80 we may say that the value of r as found by the usual formula 


& given qx (Or qs) the difference between and @ is greatest when qs = qs 


(r = 2 sin 4 ) is never more than 10°/, out, so long as the third moment coefficients 


(q;'8) of the individual variable distributions are not greater than 0-9. Professor 
Edgeworth calls slightly abnormal frequency distributions for which q, is less than 
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0°5, and moderately abnormal distributions for which q, lies between 0°5 and 0°85 
roughly. So we may say that when the skew distributions of the correlated 
variables are slightly abnormal, the error made in the calculation of r by the ranks 
correlation method assuming normal distributions is not more than 3 °/,; and for 
moderately abnormal distributions the error is not more than 9°/,. Thus, when 
we consider that this method of finding the coefficient of correlation between two 
variables is generally used when the total frequency is not very large, and fairly 
large probable errors are involved, we realise that the error introduced, due to the 
distribution of frequency being slightly or moderately skew, is not very large, when 
the probable errors involved are taken into account. 


Note on integration from — © to +. So long as the q’s in the equation of 
the surface are small, the amount of negative frequency for values of x, y beyond 
the bounding curve given by 

Z = (G40 + 3¢aZu + 3q2412 + qos 4us) 2 
is negligible, because the limits of this curve are distant more than three times the 
standard deviation from the mean. See a paper by the author in the Statistical 
Journal, Vol. LXXxXvill. Part Iv, where this matter is discussed in the case of a 
moderately abnormal distribution. 




















ON THE EXCESS MORTALITY OF MALES IN THE 
FIRST YEAR OF LIFE. 


By MAJOR GREENWOOD anp E. M. NEWBOLD. 
Proressor F. Lenz of Munich contributes to the Arch. f. Hygiene (Bd. xcu. 


S. 126—150) a paper entitled “Die Uebersterblichkeit der Knaben im Lichte der 
Erblichkeitslehre,” which suggests some interesting reflections. 





At the outset, Lenz remarks that the Mendelian inheritance of sex is now 
established (“kann heute als sichergestellt gelten”), Hence, he thinks, we can 
explain the fact, or alleged fact, that recessive pathological hereditary characters 
are only found in males. From these hypotheses—the correctness of which we do 
not intend to discuss—he infers that a larger fraction of total male mortality in 
infancy is selective than of female mortality. If this be so, it would follow, he 
suggests, that when a non-selective factor, or relatively non-selective factor, such 
as a prejudicial general environmental change, e.g. a hot summer or an outbreak 
of an epidemic, heightens the whole of the mortality of the first year of life, the 
relative excess of male mortality should be reduced. Lenz’s method of testing 
this proposition is to correlate* the ratio of the rate of mortality on males to that 
on females with the rate of mortality on both sexes in 11 series of annual rates of 
mortality for different countries (in most cases the number of years included 
is very small, 8 or 10, in only one does it exceed 40) and he finds that in the great 
majority the value of r is large and negative. 





It will be interesting to examine Lenz’s argument from a biometric point of 
view. If w is the rate of male mortality (Lenz uses, and we shall use, the 
conventional measure, ratio of deaths u:‘er 1 to births in the year) and y that of 
Ant By 
A+B 
A and B are approximately constant and each is nearly equal to 0°5, so that 


female mortality, what Lenz is correlating is #/y and But in his series 


* In addition to the ordinary product moment r, Lenz uses a coefficient invented by himself which 
he proposes to name the ‘‘ deutscher Korrelations Index, k.” It appears from the definition and worked 
example of this coefficient on p. 147, that if in a given set of data, there is no instance in which 
a value of one variable greater than its mean is associated with a value of the other variable less than 
its mean, the index must be equal to +1 or if every deviation in excess of one mean is associated 
with a deviation in defect of the other mean, the index will be —1, whatever the magnitude of the 
corresponding deviations. But on p. 132 there is a set of 8 observations (France) giving a product 
moment coeflicient of —-86 and such that every positive deviation of one variable is associated with a 
negative deviation of the other variable. But the German index is stated to be not — 1-0 but — 0-64. 
We are therefore not sure that we know what the German index really is, and cannot therefore say 
what merits, if any, it possesses, 
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approximately we have the correlation of #/y with «+y. To a first approxi- 


mation this is equal to 


Oy Oy 1 Lb 
2 © + P4y 820y(=— 7) 














& y 
———— $a 
o ro QP xy Oxo ~ - 
me IN Bt ay + Way Fe Sy 
Fa y vy F - r 


The last term of the numerator is always negative because 7, is positive and 


& is greater than 7. So that if “2. - 7 is not positive and greater in absolute 


1 - ? : 
value than rgyozoy ( = “<) then rz,,,,, must be negative. Obviously then if 
o \e 4 y © 
ov’ > 2" the value of re,,,.. is al ati 
3 > — the value of 1 (ety) is always negative. 


Let us examine the condition for a negative correlation more closely. Write 


b =male births, d = male deaths, 
b’ = female births, d’ = female deaths, 


and assume a constant sex ratio, viz. b=kb’. 


d+d’ ad om 
Let “Say mB and m' =F 
km m’ 1 ; 
Then M= isk + 1 +k and 5M = I+k (kdm + dm ). 


m ; : . 62 &m &m 
Put z = —, so that, to a first approximation, — = — ——; 
m 4 m m 


Summing 6M6z, we have 


1 = a ar = 
"Me TNs = m? a +k) [km Om —MOn” +Tmm TrFn' (m km)| . 
iy 
awe mate om r_ Tm’ 
Then writing Um = —, Um = —,, 
m m 


the sign of 7, is the same as that of 


— 9 —F ° ——f — 
KMVy? — 1 Vy? + Pim’ Un Ym: (Mm — km), 
U 


; ee ws = v 
i.e, of kin — Mh? + Tin h(m’ —km), where h=—" , 
m 
which is negative if the positive root of 
M h? —hry», (m' —km)- km =0 
is less than h, 
1 


2m’ 





i.e. if [Tam (mm! — kM) + Vo mm? Gn’ — kim)? + 4k’ mi] 


is less than A. For ry» =1, the root is 1; thus, since 7», is of the order of *97, 
the root in general differs little from unity, and (since k ¢ 1 and m < m’) usually, 


but not necessarily, exceeds unity. 
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So far as the short series given by Lenz are concerned, (1) seems adequate. 
Thus he gives the following figures for Hungary 1906—15. 








Female 


Both sexes 
per 1000 


| Per cent. excess 
of Male/Female 











| Male 

| Mortality | Mortality | 

| per 1600 | per 1000 | 

es =n 

| 1906 22-0 | 188 

| 1907 | 235 | 194 

| 1908 21°5 18-2 

| 1909 | 2928 | 1955 
1910 | 209 | #179 | 
1911 | 223 | 190 | 
1912 | 20-0 i771 | 

| 1913 | 217 185 | 

| i914 | 210 179 | 

| 1915 28-2 24:5 


20°5 (20-4) 
20°8 (20°8) 
19°9 (19°9) 
21-2 (212) 
19°4 (19°4) 
20°7 (20°7) 
18°6 (18°6) 
20°1 (20°1) 
19°5 (19°5) 
26°4 (26°4) 





be 


 & 
| 





Save in the first case, the sex ratio is, to the number of places given, unity. 


Lenz finds by direct calculation (we have verified this) that the correlation of 


(according to us + °17). 


From columns (2) and (3) 





Vy 
Tay = 
Using (1) we have re 

y 


interest. 





reaches no definite conclusion. 


Biometrika xvii 








22°29 


we deduce 


212, 

19°05, 

1°94, 
095, 
"102, 
‘999, 


2 


the last two columns is —‘77 and, if the data for 1915 are omitted, +°16 


omitting 1915 


21-63, 
"S38, 

18°444, 
‘703, 
‘0387, 
0381, 
996. 


n= —‘81 and +718 respectively, a sufticiently good 


approximation. Of course the data are too scanty to be of more than arithmetical 


The same negative correlation has been pointed out by the Registrar General 
for England and Wales in his Statistical Review for 1921 (p. 22 et seq.). In an 
interesting discussion he estimates the share in this increasing male excess which 
must result as a natural consequence of the greater male mortality in the earlier 
months of the first year of life to be a comparatively small one. He examines 
the facts in the light of the theory that a larger proportion of the males are 
essentially non-viable, a somewhat similar theory to that of Lenz; but as the excess 
mortality of males is less in the cases of prematurity and congenital malformation 
(causes from which a large proportion at least of deaths may be assumed to be 
inevitable) than for All Causes, he finds the facts difficult of explanation, and 
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That the general variability of mortality in the first year of life is greater 
in females than in males in England and Wales sufficiently appears from the 
following examples. Taking first the 55 Registration Counties (experience 1901— 


10) we find: 


z 


Cy 


Taking 86 Registration Districts 
between 10,000 and 20,000 : 


Taking (1) 91 Registration Districts with proportion of Male Births from 
‘506—508, and (2) 84 Registration Districts with proportion of Male Births from 
*508—'510, we have: 


(1) 
xz = 129°637, 
y = 104089, 


o,= 31°999, 
o,= 27°807, 
y= 247, 
vy = ‘267, 


In each case v, exceeds vy. 


We propose now to examine the variabilities of Specific causes of death. 


The groups chosen were (a) All Causes, (b) Measles, (c) Whooping Cough, 
(d) Congenital Debility, (e) Diphtheria and Croup, (f) Tuberculosis (all kinds), 
(g) Other Diseases of the Respiratory Organs, (i) Diarrhoeal Diseases. 

The exact list of headings included in each of these groups is given in Table A 
of the Appendix. Owing to changes in classification, and varying degrees of 
accuracy of assignment to cause over a long period, some caution would probably 
have to be observed if we were comparing the absolute values of these rates at 
different periods, but as we are dealing with the relative male and female rates 
this consideration is not of much weight. 


= 125°2, 
y = 100'82, 


II 


wherein the total births in 1901—10 were 


20°46, 
17°66, 
“16, 
“18. 


125°05, 
99°84, 
18°17, 
15°62, 
‘15, 


“16. 


(2) 


127°179, 


101°048, 
31556, 
28°410, 
‘248, 
331. 
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We have varied our unit of observation to get (using Tschouproff’s terms) both 
a static and a dynamic variation. For the dynamic or secular we have taken the 
calendar year as unit, and traced the infant mortality of England and Wales, and 
of London, separately down to 1922; the date of beginning varies for some causes, 
but is usually 1847 for England and Wales, and 1845 for London. In some cases, 
where for one reason or another a break is indicated, we have tested our 
results by recalculating for shorter periods. For the static variation we have 
taken first a geographical variation, with the unit of observation a Registration 
County* of England and Wales (Urban and Rural separately), all for the period 
1911—1923 inclusive, and secondly au occupational variation with the unit of 
observation the occupationt+ of the child’s father. This last is solely for 1911 
and is only available for All Causes. Before going into the results it is perhaps 
worth remembering that though we propose to compare male and female variation 
in mortality, we are not doing so in the same sense as when we compare male 
and female variation in individual characters such as height, weight, cephalic 
index, etc. In the one case the unit is an individual, and each individual can 
have any numerical value of the quality considered, in the other case the unit is 
a group of individuals—grouped either by time, place or occupation—and it is 
only in the case of the group that the quality can have a series of different 
numerical values. In other words to say that a set of groups of females is 
(absolutely or relatively) more variable than a set of groups of males under 
apparently the same conditions is to say that the same increment of adverse 
conditions to both, causes (absolutely or relatively) more additional female than 
male deaths, 

Since we are dealing with the variation of a ratio, we will first see how much 
of the variation may be due to simple sampling, and how this part differs in the 
male and female mortality, though it is, of course, abundantly clear without any 
calculation at all, that our figures differ from those of a case of simple sampling 
in more than one respect. 

Suppose g is the rate of mortality per person in any population and we take 
a series of samples of size » out of the population at random, the standard 
deviation of q will be Jia where p= 1 —4, its Coefticient of Variation 100 P 

- ng 


3) a Cu . ‘ : ; : 
and the quantity —* (the “ of equation (1), on which the sign of Lenz’s corre- 
q wv 


lation depends), will be reduced to . 
* The counties where the total births (Rural and Urban, male and female together) did not exceed 
20,000 have been grouped as follows : 
Anglesey and Carnarvon. Ely, Huntingdon, Peterborough and Rutland. 
Brecknock and Cardigan. Westmorland and Cumberland. 
Merioneth, Montgomery and Radnor. Isle of Wight and Southampton. 


This gave 52 Counties in each set. 
+ Only those occupations were included where the numbers of births of both male and female 
separately were over 1000, 


222 









' 
{ 
i 
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The curves of these three variables for increasing values of q and constant value 
of x are shown in Figures 1, 2, and 3. The values of q with which we are 
dealing are all very small so that we are only concerned with the left-hand part of 
the graphs. For these it is clear that if simple sampling alone is in question, 
the standard deviation increases, while the coefficient of variation and the function 


a, aie cs ; : P : 
9 decrease, with increasing q for constant ». If, as in our cases, the numbers in 
G 


the samples vary, we simply have to replace n by the Harmonic Mean of its 
various values. Hence if the numbers of births of boys and girls were really— 
instead of only approximately—equal, then since the male mortality is, for all 
cases except whooping cough, greater than the female, the male standard devia- 
tion would be greater but the male coefficient of variation, and also the function 


eg less than the female. Hence the negative correlation observed by Lenz would 


occur, not only for all causes, but for each cause separately (except whooping cough, 
where it would be positive), Asa matter of fact since the male births in reality 
always slightly exceed the female, n and q increase together, so that what variation 
there is in n tends (except for whooping cough, where the reverse is the case) in 
the case of the standard deviation to counteract its tendency to increase with q, and 
in the case of the other two functions shown in the figures to accentuate their 
tendency to decrease as q increases. 


In all cases ard causes, therefore,—where the male mortality exceeds the 
female and where simple sampling alone holds,—we expect, by virtue of equation 
(1), a negative correlation between the male/female and the total mortality. 

As we have pointed out before, however, it is quite obvious that the simple 
sampling scheme cannot be expected to fit any one of our sets of series. Neither 
in the case of the calendar years, nor of the geographical districts, nor of the 
father’s occupational groups, can the samples of our series be supposed to be 
drawn out of the same bag—we know that in many cases a real difference in the 
chance of infant death exists from year to year and from district to district. 
Such differences would tend to make our observed standard deviations greater 
than the theoretical*. Our series must also, however, depart from those of simple 
sampling in another direction, in any single draw of n balls from a bag—or to 
vary our analogy in any single throw of n dice the chance of throwing a given 
number or not, is not always the same from one die to another; in other words, 
the chance of death is not alike for everybody in one district, or one calendar 
year, or one occupational group; this is obviously true in any case and it is 
accentuated in the case of special causes whose incidence is largely seasonal or 
epidemic, and thus not spread evenly over the year of exposure. Infants for 


* If sis the observed, and c= a/ Pete the theoretical standard deviation of g, then 


gi Podo m= 


C. 2 
n ee ie 


where ¢, is the standard deviation of the different q’s. 
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instance, born in October, would have reached a more resistant age when the 
time came for the next summer’s rise in infantile diarrhoea than those some few 
months later. This uneven weighting of individual dice tends to make the 
observed standard deviation less than the theoretical*. 

Of these opposite tendencies our results show us that the first is in almost 
every case the stronger, in other words the effect of the differences between the 
groups is considerably greater than that of the differences between the chances of 
death of the individuals in a group, for in Table I, the observed standard deviation 
exceeds the theoretical for both male and female. (In the group Diphtheria and 
Croup alone, for the variation from county to county, though not in the secular 
variation, does the theoretical approximate to and in one case even slightly 
exceed the observed variation.) Hence on the whole the first tendency is clearly 
the more important to consider. 

We have no @ priori reason for supposing that this variation of the districts, 
years or occupations will follow the same sort of course with increasing gq as does 
that part due to simple sampling, as we have no @ priori knowledge of their 
distribution. From our empirical results in Table I, we see that as a matter of 
fact the nett observed standard deviations and coefficients of variation, do, the 
former always, the latter only on the whole and with several exceptions (e.g. 
especially in some of the groups for measles, diphtheria and croup, and respiratory 
diseases), behave in the same way as the simple sampling variations, i.e. in general 
the male standard deviation is greater than the female standard deviation and the 
female coefficient of variation more often greater than less than the male coefficient 
of variation (with the exception of whooping cough which is in both cases reversed). 
This general rule for the absolute and relative variation has been tested also for 
all causes, England and Wales, when the variation is measured by the average 
absolute difference between each pair of consecutive years, and the percentage of 
this difference on the rate for the first of the two years. For England and Wales 
(All Causes, 1847—1922) the figures are : 


Male Female 
Average absolute difference - ev 8°81 8:08 
Average percentage absolute dillenes mee ... 593 6°65 


If we want to get a true idea of whether we are justified in attaching any 
sort of importance to the difference between male and female variations, we must 
first get rid of that part of it which may be due to random sampling and then 
examine the difference that remains. 


The relative true variation—apart from chance—in the rates is shown in the 
last column of Table I, which gives the value of 


100 V (observed s. d. of rate)*—(simple sampling s. d. of rate)? 
——_—_—————» 
mean rate 
* In this case if s is the observed, and ¢= Po Pedy the theoretical standard deviation of q, then 


Pode oe" P — aed 
t= _ - . , where oc, is the standard deviation of the individual chances. 
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TABLE I (a). Secular Variation in Infant Mortality by Causes and Sex. 






































| Standard Deviation Coefficient of Variation 
— ——— * 
Group mortality : see 
per 1000 Observed —- Simple vice) 
per 1000 ampling Observed Senate 
per 1000 ping 
All Causes: 
England and Wales, 1839--1922 Male 153°85+1°68 | 22°86 +1°19 “580 14°86 + -79 377 14°85 
Female | 126°19+1°50 | 20°37 +1-06 545 | 1614 + 86 432 | 16°14 
| London, 1839—1922 Male 157°90+1°88 | 25°49 +1°33 1606 | 16-14 + -86 1017 | 16°11 
Female | 132°50+1-72 | 23°35 +1-22 | 1:523 |1762 + 95 | 1149 | 17°59 
Measles: 
| England and Wales, 1847—1922 Male 2°66+ 057 ‘737+ -040 O81 | 27°70 +1°63 3°04 27°54 
Female | 2°28+ -046| -600+ 033) 076 | 26-29 +153 | 3°35 | 26°10 
| England and Wales, 1847—1913 Male 2°74+ °053 “639+ -037 — 23°34 +1:43 — 
Female 2°35+ 042 512+ 030; — 21-76 +1°33 — 
London, 1845—1922 Male 3°48+ ‘087) 1°137+ -061 252 32°70 41°95 7°25 44°71 
Female | 299+ -076| -993+ -054| -231 | 33-22 +198 | 7-97 | 32-30 
Whooping Cough: 
England and Wales, 1847—-1922 Male 516+ 098} 1:27 + :069 112 =| 2454 +1°43 2°18 24°52 
Female 580+ °107| 1°38 + ‘076 122 | 23°80 +1°37 2°10 23°70 
England and Wales, 1847—1913 Male 5°43+ °084| 1°020+ ‘059 —_ 18°77 +1132 — = 
Female 610+ -091| 1°098+ 064) — 17°99 +1-082 — 
London, 1845—1922 Male 664+ +185 | 24264 *131{ -348 | 3656 +#2-293| 5-24 | 36°22 
Female 731+ °203|) 2°656+ °143 ‘372 «| 36°34 +2°206 5:09 36°03 
| Diphtheria and Croup: 
| England and Wales, 1855—1922 Male "865+ -045 545+ 032 045 | 63°04 4+4°885! 5°23 62°79 
Female “668+ -034 “415+ ‘024 041 62°11 +4°781 6°07 61°83 
London, 1859—1922 Male 1°094+ ‘049 ‘D75+ 034 136) «| 52°57 +3°91 12°40 51:07 
Female 851+ °034 “404+ *024 120 | 47-40 +3°40 14:07 45°33 

London, 1866—1922 Male ‘961+ °039 "438+ ‘028 — 45°64 +3°43 — —_ 

= Female ‘769+ ‘031 341+ 022 _ 44°40 +3°31 — —_ 

England and Wales, 1866—1922 Male 686+ 031 345+ °022 — 50°37 +3°91 — -- 

Female 528+ 023 "254+ ‘016 _- 48°13 +3°68 —- _ 
Tuberculosis (all kinds) : 
England and Wales, 1847—1922 Male 8°465+ °241| 3°115+ °170 144 | 36°80 +2°27 1°70 36°82 
Female 6°768+ °195| 2°524+ °138 131 37°30 +2°31 1°94 37°18 
London, 1845 —1922 Male 11°68 + 367} 4°802+ -259 | ‘460 | 41°10 +2°57 3°94 40°91 
Female 9°44 + °377| 4°9382+ -266 | “422 52°23 +3°51 4°47 52°03 
Respiratory Diseases (other than Tuberculosis): 
England and Wales, 1847—1922 Male 26°626+ °327| 4°225+ °231 253 | 15°866+ -890 ‘949 | 15°86 
Female | 20°755+ °258 | 3°335+ ‘183 228 16°068+ *902 1°10 16°05 
London, 1845—1922 Male 30°188+ °415 | 5°489+ °294 °733 18°018+ °415 2°43 17°86 
Female | 24°130+ °328| 4°288+ °232) “670 17‘772+ ‘990 2°78 17°56 
London, 1847—1913 Male 31°37 + °382{/ 4°64 + ‘270} — 14°783+ °880 = — 
Female | 25°14 + +293] 3°56 + °207 | — 14151+ ‘841 —_— — 
London, 1847—1922 Male | 30°23 + -426| 5°50 301) — | 1818941028]  — i 
Female | 24°21 + °334| 4°32 + ‘236 — 17°824 + 1:006 sas _— 
| Diarrhoea: | 
England and Wales, 1847—1922 Male 19°080+4 °573) 7405+ *405 | 215 | 38°812+2°422 1°13 38°82 
Female | 16°098+ °523| 6°753+ ‘370 202 41°953 + 2°669 1°25 41°91 
London, 1845—1922 Male 22°506+ °571| 7481+ °404 635 | 33°242+1°98 2°82 33°12 
Female | 19°424+ °534| 6°992+ ‘378 603 | 35°995+2°18 3°10 35°85 
Congenital Debility: 

England and Wales, 1847—1922 Male 44°43 + °275] 3°55 + °194 323 799 + *440 "728 7°96 
Female | 36°58 + °258| 3°33 + °182 301 911 + 502 *822 9°07 
England and Wales, 1847—1913 Male 45°32 + ‘220| 2°67 + ‘156 — 5°89 + 344 _— — 

Female | 37°48 + ‘191 |] 2°31 + °135 = 617 + ‘361 —s wats 
| London, 1845—1922 Male | 39°03 + -303| 39704 -214] -830 |10°17 + °555| 2°13 9°95 
Female ; 32°36 + °250| 3°275+ ‘177 773 |10°712 + ‘552 2°39 9°85 

London, 1847—1922 Male | 39:29 + -282| 36434 ‘199} — 927 + 512} — ass 
Female | 32°52 + -241| 3°110+ -170| -- 956 + 528) — whe 
| London, 1847—1913 Male | 39°94 + 265] 3-221+ 188] — 8064+ -473| — sini 

Female | 33°28 + ‘191 | 2°323+ °135 _ 6981+ °409 -- oon 



































% 100 / (observed s. d.)?— (simple sampling s. d.)® 
mean 
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52 Registration Counties+ (1911—1923 inclusive). 


ENGLAND AND WALES. 


TABLE I (b). Geographical Variation in I nfant Mortality by Causes. 


















































All Causes 





Simple 


Sampling 
per 1000 


bo bo bo bo 





Coefficient of Variation 


| 





Observed 
17°27 + 118 
1768 + 1°21 | 
16°39 + 111} 
17°32 + 1:18} 
61°64 + 5°41 
61°30 + 5°37 
43°53 + 3°38 
41°10 + 3°15 
25°16 + 1°77 
20°318+ 1°40 
26°854+ 1°90 
25°23 + 1°77 
83°08 + 8-48 
85°30 + 8:84 
68°48 + 6°30 
10507 +12°45 
34°77 + 2°56 
32°59 + 2°37 
31°68 + 230 
45°05 + 3°53 
24°41 +1°71 
26°13 +1°84 
24°21 +1°69 
27°09 +1°92 
38°59 +2°91 
41°74 +3°21 
29°91 +2°15 


+ | 
38°48 +290 | 
11-0264 -738 | 
12°124+ °814) 
14-932 + 1-009 | 
14-4214 ‘973 | 

| 





mean 


Standard Deviation 
Mean 
Group mortality 
per 1000 Observed 
per 1000 
All Causes: 
Rural Male 84°13 +1°36 | 14°53 + ‘961 
2 Female | 65°58 +1:08 | 11°59 + ‘767 
Urban Male 95°13 41°46 | 15°59 +1°03 
Female | 74:21 +1:20 | 12°85 + -850 
Measles: 
Rural Male ‘931+ °054 ‘D744 °038 
Female ‘778+ ‘045 “477+ *032 
Urban Male 1°615+ ‘066 ‘703+ °047 
Female | 1:391+ -053 ‘D72+ +038 
Whooping Cough : 
Rural Male 2°969+ -070 ‘747+ 049 
Female | 3-371+ -064 "685+ 045 
Urban Male 2957+ *074| -794+ 053 
Female | 3°528+ -083 “890+ 059 
Diphtheria and Croup: 
Rural Male ‘117+ ‘009 097+ 006 
Female ‘095+ -008 ‘081+ 005 
Urban Male 193+ -012 132+ -009 
Female 140+ ‘014 147+ ‘010 
Tuberculosis (all kinds) : 
Rural Male 1°655+ °054 ‘D75+ °038 
Female | 1-292+ +039 “4214 -028 
Urban Male 2°321+ 069 ‘735+ 049 
Female | 1-822+ -077| -8214 -054 
Respiratory Diseases (other than 
Tuberculosis) : 
Rural Male 14°80 + °338| 3-61 + -239 
Female | 11°07 + ‘271| 2-89 + -191 
Urban Male 17°83 + *404) 4:32 + -286 
Female | 13°15 + 333] 3-56 + 236 
Diarrhoea: 
Rural Male 7416+ *268| 92-8614 -189 
Female 579+ *218| 2:3994+ +154 
Urban Male 11°100+ ‘311 | 3°320+ -220 
Female | 8:472+ °305 3°260+ ‘216 
Congenital Debility : 
Rural Male 35°30 + °364 3°893+ °258 
Female | 28°07 + ‘318| 3:4034 -225 
Urban Male 35°23 + °492| 5261+ +348 
Female | 29°28 + 395 | 4:223+ -279 
100 (obs 


served 8, d.)?— (simple sampling s. d.)*_ 


Simple 
Sampling 





* 


(see below) 


17°06 


17 
16 
17 
55 


53 
37 


32°90 


20 
14 


21°18 


19 


32 


imaginary 


21 


“39 
*L5 
‘Ol 
“50 
"35 
35 


16 


“29 


“94 


‘97 


97 


70°45 


28 


22°70 
25°59 


39 


23°45 
24°87 
23°29 


28 


59 


25°87 


37°39 


40°21 


28°69 


37°18 


10°14 


11°05 


14°17 


13°41 








+ For grouping of smaller counties see footnote on p. 331. 


TABLE I (c). Variation by Occupation of Father, 1911 (Legitimate only). 


92 Occupations with over 1000 male and 1000 female births. 


Male 
Female 


129°65 +2°10 | 
1:88 | 


106°50 + 


29°88 + 1°49 
26°67 + 1°33 


7°30 
6°82 


| 23°0541°21 | 
25°04 + 1°32 


5°63 
6°41 
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and shows that out of the 33 cases which do not overlap, we get 17 cases where 
there still appears a real excess of female variation and 16 where there is equality 
or a male excess. The only cases where we never find a male excess are All 
Causes, and Diarrhoea, and the only case where we never find a female excess is 


Whooping Cough. 


These cases are not all equally fitted for the application of Lenz’s argument, 
but we shall return to this point later, and for the present consider simply the 
numerical values apart from the causes. Now an excess female relative variation 


1s not enough to produce a negative correlation, unless it is large enough for 

CO; oy ‘ ; cs 1 1 : 

— — — to be also negative, or if positive, < ry o,oy e - =) . Table II gives, in 
Y “\x% ¥Y 


the last three columns but one, the observed and simple sampling values of 
CO, Oy : . 

=o 7 , and also the sign of + deduced from the approximate formula (1). 
Simple sampling alone would, as we have pointed out before, give in every case a 
negative sign for the correlation, except in that of whooping cough where it 


i . — . 
would be positive by virtue of the predominating term rzyo,0, (<- = The 
: a 4 


signs of » marked with an asterisk in Table II are the cases where both factors 
of the numerator have the same sign; where the signs differ we have had to 
approximate still further by taking 7,, = ‘97 (a value found by experiment). We 
have tested this approximation in five of the latter cases by direct calculation 
of r, and find the signs agree in all cases but one, where the correlation is negli- 
gible. The approximations hence seem justified. 


It appears that when the signs of the two terms in the numerator differ, in 
rather more than 2/3 (17 as against 8) of the cases the sign of the second prevails 
over the first. That is to say in the large majority of cases the sign of the corre- 
lation is due simply to the fact that the mean male mortality is larger (or in the 
case of whooping cough smaller) than the female. It is clear also from Table IT, 
where the causes are arranged in order of size of the absolute excess of the male 
mortality over the female, that the positive correlations tend on the whole to 
appear more frequently when this excess is small or has become a defect as in 
whooping cough. 

This tendency, and the reversal in the case of whooping cough, suggest that 
the correlations (if the variation is not too large) may be thrown back simply on 
to the size of the excess mortality, quite irrespective of which sex is in excess or 
of the cause in question. 


As we pointed out before, Lenz’s argument does not apply to a single cause 
taken separately. He says “denn wenn die Ubersterblichkeit der Knaben in 
wesentlichen auf der Auswirkung krankhafter Erbanlagen beruht, so wird sie in 
Zeiten wo die Sauglingsterblichkeit aus dusseren Ursachen (italics ours) besonders 
hoch ist, verhiltnismiissig niedriger sein” (pp. 128—129). But from Table IT we 
find that the negative correlation frequently does apply to a single cause. Take 
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diarrhoea, for instance, the sign from formula (1) is always negative, and in the 
case checked r =— 549 + 094, a fairly substantial value. Here we have a case 
where the “iusseren Ursachen” are at their height, and the “ Ubersterblichkeit ” 
of the boys from these same “ausseren Ursachen” is relatively low, which cannot 
be explained by Lenz’s theory. 


If we try to classify the different causes according to whether by formula (1) 
they would give a negative or positive correlation we find that the sign is not 
always the same for the different series treated under each cause. Putting them 
in order of the stability of the negative signs we get (omitting overlapping 
series): 

(1) All negative. Diarrhoea and All Causes. 

(2) ? negative. Tuberculosis, Congenital Debility and Respi- 

ratory Diseases. 

(3) 4 negative and } positive. Diphtheria and Croup. 

(4) } positive. Measles. 

(5) All positive. Whooping Cough. 

If we classify them according to the average excess of the female coefficient of 
variation over the male we get: 

(1) Diphtheria and Croup. 
(2) Tuberculosis. 
(3) Diarrhoea. 
(4) All Causes. 
(5) Congenital Debility. 
(6) Respiratory Diseases. 
Male excess (7) Measles. 
Male excess (8) Whooping Cough. 

If we finally classify them according to that part of the excess of the female 
coefficient of variation over the male which is not due to simple sampling (last 
col. of Table I) we get: 

(1) Diphtheria and Croup. 

(2) Tuberculosis. 

(3) Diarrhoea. 

(4) All Causes. 

(5) Respiratory Diseases. 

(6) Congenital Debility. 
Male excess (7) Whooping Cough. 
Male excess (8) Measles. 

This average classification may be to some extent misleading, as the variations 
are of different type and the geographical ones more unstable owing to smaller 

















Masor GREENWOuD AND E. M. NEwsBo.p 341 


numbers (the diphtheria and croup group for instance owes its high position in 
the last two classifications to a single large female coefficient of variation in the 
Urban Counties with a large probable error*). In other cases too, different choice 
of groupings or of periods covered give different results as regards the relative 
size of the male and female coefficients of variation. We must therefore base our 
judgment on the most consistent results. The causes in which the negative 
correlation and the female excess relative variation are most stable are Diarrhoea, 
All Causes and then Tuberculosis, and those with a positive correlation and male 
excess relative variation are Whooping Cough and Measles. It is not easy from 
these results to find support for any theory of connection between sex differences 
in variation and unavoidable mortality attached more to one sex than to the other 
which might be supposed to occur in groups such as All Causes or Congenital 
Diseases, as opposed to the more definitely epidemic causes. 

Consideration of the secular graphs of the course of the mortality and of the 
percentage — mortality both tor England and Wales and for London, shows 


that in the causes with a small absolute rate such as measles, and still more 
male 
female 
mortality are so large as to mask any general tendency that might be present. 
In the causes with large mortality, the excess variation is clearly due mainly but 
not solely to the improvement in infant mortality since 1900. 


diphtheria and croup, as might be expected, the fluctuations in the 


The general impression given both by the tables and the graphs (which are 
not reproduced) is that it is a general rule for a greater absolute variation and a 
less relative variation to go (in any cause or group of causes) with that sex that 
has the higher mortality in that cause, irrespective of which sex it is, but that in 
causes where the absolute mortality is lowest this tendency is obscured or over- 
come so far as regards the relative variation. It is unfortunate that whooping 
cough is the only cause of death which persistently strikes female infants more 
than male, so that this cannot be tested further in that direction. 

To a certain extent these results are unsatisfactory as they do not appear to 
afford a basis for any theory which would give us a clue as to what proportion 
of infant deaths are to any extent “inevitable” in any cause. We do not therefore 
feel justified in doing more than simply presenting the facts of this analysis, 
which do not in our opinion uphold Lenz’s theory, without attempting to come to 
any definite conclusion as to inherent differences in male and female variation. 


* We cannot from these data calculate with any sort of accuracy the probable error of the difference 
between the male and the female coefficients of variation, as without making a probably quite unjustified 
assumption of normality, we do not know the correlation in error between the male and female 
coefficients of variability. 
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APPENDIX. TABLE A. 





On the Excess Mortality of Males in the First Year of Life 





Diseases included in the various groups, according to the headings in the 


1845—1880 


(a) All Causes 

(6) Measles 

(ec) Whooping Cough 
(ad) Whole of tv. 1 

i.e. Congenital Mal- 
(1858 
and before called 
Diseases of Growth, 
Nutrition and De- |- 
cay) and Develop- 
mental Diseases of 


formations 


children 


also Iv. 4 Atrophy 


and Debility 


(e) Diphtheria 


Cynanche Maligna 
Croup 


(Ff) Whole of Group 11. 2 | 
i.e. Tubercular Dis- 


eases 


(g) Wholeof Group IL. ¢ 
ie, Diseases of the 
Respiratory Organs 


(4) Dysentery 


Registrar General’s Annual Report 


1881—1900 


All Causes 
Measles 


Whooping Cough 





Premature Birth 
Spina bifida 
Imperforate anus 


i.e. Developmental 


1% Dal. a 
= Palatehare and adding De- | 
. bility, Atrophy 


Other Congenital 
Defects 

Cyanosis 

\Debility, Atrophy, Inanition 


and Inanition 


Diphtheria and Croup 


(Tabes Mesenterica 
Tubercular Meningitis (Acute Hydro- 














cephalus) 
|| Phthisis 
| Other forms of Tuberculosis. Scrofula 
Lupus 
Laryngitis i.e. whole of 
Other Diseases of Group vi. 4, | 


Larynx and Trachea 


| 

| Emphysema,Asthma and adding 
|- Bronchitis Laryngismus 
| Pneumonia Stridulus 

| | Pleurisy 


Respiratory System 





| (\ ‘holera 
|} Diarrhoea, Dysentery i.e. Group 1. 2 
and Enteritis 


| and Enteritis and 
|| Gastro-enteritis 

Before 1899 enteritis and gastro-ente- | 

ritis were not distinguished 





1901—-1910 


Numbers refer to the 
Registrar General’s 





1911—-1920 1921—192 











Diseases V, omit- | 
Atelectaris | 




















omitting Croup| 


Other and Undefined Diseases of the | 


All Causes 
Measles 
Whooping Cough 


Congenital Iydrocephalus 
Other Congenital Defects 
Want of Breast Milk 
Atrophy. Debility 


| (Congenital Birth 


Diphtheria and Croup (not 
spasmodic nor membra- 
nous) 

Pulmonary Tuberculosis 
(Tuberculous Phthisis) 
Phthisis (not otherwise de- 

fined) 

| Tuberculous Meningitis 

Peritonitis 


q a 
Tabes Mesenterica 


Lupus 
Tubercle of other organs 
General Tuberculosis 
Scrofula 





( (rons 

| >» . 

Pneumonia) pide 

| Not defined 

| Laryngismus Stridulus 
Laryngitis 

Membranous Laryngitis 
(not Diphtheric) 

Other Diseases of Larynx 
and Trachea 

Bronchitis 

Emphysema 

Asthma 

Pleurisy 

Fibroid Disease of Lung 

Other Diseases of the 
Respiratory System 


‘Diarrhoea due to food 
Infective Enteritis 
Epidemic Diarrhoea 





defined) 
Dysentery 
Enteritis (not epidemic) 
.Gastro-enteritis 























Diarrhoea (not otherwise | 


All Causes All Causes 
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9—11 
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TABLE OF THE FIRST TWENTY TETRACHORIC 
FUNCTIONS TO SEVEN DECIMAL PLACES. 


By ALICE LEE, D.Sc. 


(With an Introductory Note by Kart PEARSON.) 


THE present table was computed as ancillary to a complete table for tetrachoric 
coefficients of correlation, which will shortly be published. It gives the tetrachoric 
functions from 0 to 19 to seven decimal places for argument intervals of h (= «/c,;) 
equal to ‘1 and proceeds from 0 to 4:0. 


The Biometric School defines the tetrachoric function 7, () by the equation : 


(-—1)7 d 1 _3n2\ 
Ts (h = a pl dh $—1 ia a 
Vs! v NV 2Qar 
In place of the tetrachoric function the Scandinavians have used the deriva- 
tives of: 
1 _ pe 
o (h) = —e*". 
V 2a 
This suffers from certain disadvantages; it gives no simple nomenclature for the 


x 9° 
integral I a52° 4°" la which we take as 7, or 4 (1 —a,) in Sheppard’s notation ; 
n VQ 
and further it leads to much range of variation in the functions themselves, so that 
the differences may be very considerable at one part and small at another part of 
the table. The simple expedient of using (1) as our function causes all the tetra- 
chorics to be proper fractions, less than unity. Further (1) is the form which arises 
naturally, when we are dealing with tetrachoric coefficients of correlation, and saves 
then a large amount of labour. 
In the Tables for Statisticians and Bicmetricians* a table (XXIX) is given 
providing the values of 7,, T2, 7;, T,, 7; and 7, for each one thousandth in the value 
of 4(1—a,). The value of h is also given, but not of course by equal increments 


’ , 1-48 — , -” : 
of argument. A useful table of ¢(t)= 75° “and its first eight differentials 
NV ar 


for the argument ¢ proceeding by hundredths, is given by Professor James W. 
Glover, pp. 392—411 of his Tables of Applied Mathematics in Finance, Insurance 
and Statistics. But this table only goes to five decimal places, and third differences 
would be requisite to use the table to its maximum advantage. The ideal table 
would be one of tetrachoric functions going to 7 or 8 decimals and exhibiting 
first and second differences, the argument proceeding by ‘01. Our present table 


* Part 1 issued by the Biometric Laboratory, University College, London, 2nd Edition, 1924. 
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forms a framework for the construction of such a table. But while the computing 
would be possible the cost of printing the requisite number of royal octavo pages 
of figures would at present be prohibitive. Accordingly we have had to content 
ourselves with the more modest table now printed. This table, if used to its 
maximum value, i.e. to obtain the tetrachoric functions to seven figures, requires 
the use of &7, and 8'7,, if central difference interpolation be adopted. These have 
not been tabled as it would have doubled the cost of printing. They can, however, 
be determined with a moderate amount of labour in two different manners. 
(i) Using the formulae 
S17, (h) = 7, (h +1) + Ty (h—1) — 2g (bh) crceece cece erence ene (i), 
87, (h) = & 7, (h + 1) + & ty (h — 1) — 28 7, (hr) «eee eee ees (ii), 
we find &7,(h), &7,(h+1), &7,(h) and 67,(h+1) for interpolating a value 
between 7, (h) and 7,(h+1). This involves taking out 7, (h — 2), ts (h — 1), 75 (2), 
T;(h +1), t;(h + 2) and 7, (h + 3), i.e. six entries in all. 
For example, suppose we desire a value of 7,,; between 1 and 1-1. We have: 
7 (0°8) = — 042,6729, 
713 (0°9) =—-043,0704, 8%7,,(0°9) = + 05,3224, 
73 (10) = — 038,1455, 87, (10) = + 004,1923,  8'7,, (1°0) = — 000,4818, 
7 (11) = — 029,0283, 8¢7,, (11) = + 002,5804, 87, (1°1) = — 000,1935, 
43 (1°2) = —-017,3307, 87,,(1-2) +-000,7750, 
7; (13) = — 004,8581. 
But the central difference formula runs, if ¢ = 1 —@: 
713 (1°0 + 0) = Or; (1°1) + O7,; (1°0) — 3 {(1 + A) 87; (11) + (1 + p) & 74; (10); 
A(1+0)o(1 
120 
Suppose 6 = 3467, then 
71 (103467) = 3467 (— -029,0283) + °6533 (— -038,1455) 
~ | (3467) (6533) {1°3467 (-002,5804) + 16533 (-004,1923)} 
+ + 4y (3467) (13467) (6533) (16533) (2°3467 (— 000,1935) 
+ 26533 (—-000,4818)} 


+ +”) {(2 + 0) Sr, (1:1) + (2 + p) 67, (1°0)} — ete. ......(8). 


=~ -010,0641,1 

—-024,9204,6 | 
000,3928,3/ 
— 000,007 He 
the required value. 


= — ‘035,3847, 


(ii) In the second method we make use of very close expressions deduced from 
Taylor’s Theorem for the higher differences. We have very approximately : 


OTs 


V(s + 1)(8 +2) Ty52+ 


| 
= « - > > IV 9 q 
100 : (20,000 V8 + Ds + 2)(8 +3) (8+ 4) Tos, 




















+ 


1 
6,000,000 


= ee 
or: 6 Ts = 3Xs Ts+4 + 4Xs Ts46 
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Ot aU Rs | ksh Sai eens pened lai eo Lee ..- (4), 
1 a eats 
Sr, = 10,000 V(s +1) (s+ 2) (8 +3) (8 +4) Tors 


V(s + 1) (s+ 2) (8 +3) (s+ 4) (8 +5) (8+ 6) Tots, 


Hence, if a table of the ;y,’s be provided, we can write down the values of 7, 
and ‘7, by a single operation on the machine for each. This is shorter than 
deducing the differences from the main table itself. The formulae (4) and (5) give 


! 
| 


Order of 
Tetrachoric | 

Function | 
| . | 
Ss | 


9 

| 10 

11 

| 12 
13 

| 4 
| 15 





TABLE 


I. 


Table of the x-Coefficients. 


1Xs 


‘01414214 
0244,9490 
‘0346,4102 
‘0447,2136 
‘0547,7226 
‘0648,0741 
0748, 3315 - 
‘0848,5281 
-0948,6833 
-1048,8089 


1148,9125 + 
“12489996 
-1349,0738 
‘1449,1377 
‘1549,1933 
*1649,2422 


Pr. 


il 


8!r, (h)=3Xxs *Ts44 (h) +4xs *Ts+6 (h) 


2X8 


‘0000,4082 
-0000,9129 
-0001,5811 
*0002,4152 
-0003,4157 
-0004,5826 
-0005,9161 
“0007,4162 
-0009,0830 
0010,9163 
0012,9164 
0015,0831 
‘0017,4165 
-0019,9165 
0022, 5832 
(0025,4165 


| 
eer 


| 
| 
| 





3X8 


0004,8990 
-0010,9545 
0018,9737 
-0028,9828 
°0040,9878 
0054,9909 
-0070,9930 
-0088,9944 
‘0108,9954 
0130,9962 
0154,9968 
°0180,9972 
-0208,9976 
-0238,9979 
-0270,9981 
*0304,9984 


For the purposes of calculation it is convenient to note that 


1309 (1+ 4) (1+) =q5 (44) (1+ 369), 


we will recur shortly. 


Biometrika xvii 





which needs only one multiplication. 


4X8 


-0000,0447 
“9000,1183 
*0000,2366 
-0000,4099 
-0000,6481 
0000,9612 
0001,3594 
0001,8526 
0002,4507 
-0003,1639 
0004,0020 
0004,9751 
-0006,0933 
0007,3664 


“0008,8045 * 


‘0010,4177 


T;(h+0)=6r, (h+1)+ or, (hr) — Op (14+ 8) 87, (A4+1)+(1 +) 87, (4)} 
+7398 (1+) p (1+) {(2 +4) d'r, (A+1)+ (2+) dx, (h)} - 
br, (h)=1Xs «7342 (h) +X. Te+4 (h) 





the differences extremely closely, so that we can proceed by this method as far as 
’7,, and 6§*7,,, not further, because we should require higher tetrachorics than 7,, , 
which is the last tabled. But most computers will rarely need to go even as far 
as 73. It is only in special table-construction work, for which the present table 
was ancillary, that these high tetrachoric functions are needful, a point to which 


23 
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We will now illustrate the method of Equations (7) and (8) on the previous 
example. 

8*7,; (10) = °1449,1377 (-029,2108) + °0019,9165 (— -020,5443), 
8°73 (1:1) = °1449,1377 (-017,9171) + 0019,9165 (— 008,0852),. 
827, (1:0) = 004,1921, 827, (11) = 002,5803. 

These values are as correct as, possibly more correct than, the values 004,1923 
and ‘002,5804 found from the main table itself, as in the latter case the apparent 
higher differences may be considerably affected by the raising of the last figure of 
the tabled entries. Similarly 

&7,; (1:0) = 0238,9979 (— 020,5443) + 0007,3664 (+ °012,5634), 
‘7; (11) = 0238,9979 (— -008,0852) + 0007,3664 (— -000,2002). 
87, (1:0) = — 000,4817, 87, (1-1) = — 000,1934. 

These are again in good agreement with the values found, i.e. — ‘000,4818 and 

— ‘000,1935, by the more laborious operation from the main table. 


Hence: 


Hence : 


If we again substitute in equation (6) to find 7,, (1°03467) we have 
7; (1°03467) = — °035,3847 


as before*. 


We may conclude, I think, that the second method is shorter and fully adequate 
if we do not want differences beyond those of 7,,. If we do, we must either com- 
pute enough additional r,’s by means of the difference formula: 


Ce CR) Se AGF 5-5 CR) — 6 Foe lB) «0000005 scsvenseccnsveseces (9), 
where p, and q, are given in Table II below, or we must of necessity fall back on 
the first method. The first method, however, will fail whatever be the value of s 

TABLE II. 
Values of p, and q,. 


8 | Ps | ds = Ps | qs | 






















* Actually by the first method we have 








figure accuracy in the 7’s. 


035,3846,8 and by the second method 


| | 
ae eee ces 
2 | +707,1067,812 -000,0000,000 14 -267,2612,419 ‘889,4991,800 | 
3 ‘577,3502,692 -408,2482,905 15 *258,1988,897 *897,0852,271 
5 -500,0000,000 *D77,3502,692 16 -250,0000,000 -903,6961,141 
5 *447,2135,955~- | °670,8203,933 17 -242,5356,250 -909,5085,939 
6 *408,2482,905~ | °730,2967,433 18 -235,7022,604 ‘914,6591,208 
7 | -377,9644,730 | -771,5167,498 | 19 *229,4157,339 -919,2547,198 
| 8& | *353,5533,906 *801,7837,257 2) -223,6067,978 ‘923,3805, 169 
| 9 ‘333,3333,333 *824,9579,114 21 -218,2178,902 ‘927,1050,693 
| 10 | +316,2277,660 *843,2740,427 22 *213,2007, 163 ‘9304842, 104 
| w1 | -301,5113,446 “858, 1 163,303 23 *208,5144, 141 -933,5638,714 
| 12 | -288,6751,346 *870,3882,798 24 -204,1241,452 -936,3821,838 
| 13 | +277,3500,981 *880,7048,459 25 -200,0000,000 ‘938,9710,681 
| 


N.B. The last figure in g, may be a unit in doubt, but this will not affect seven 


- 035,3846,6. 
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when we approach the under limit of our main table, for we cannot get 5'7,(h) 
from the table itself for values of h greater than 3:7, for 5‘7,(h) requires a know- 
ledge of 7,(h+3); we are therefore compelled in such cases to work with the less 
accurate backward difference method. Our second method does not however fail 
in these cases, even for h = 4, until s = 13, and accordingly should be used at the 
foot of the table for h =3°8, 3-9 and 40 for values of s from 0 to 13. We have 
accordingly the following scheme : 


Order of Tetrachoric Functions. 


| | | 
aes oe ee. ARES - Ss = 


rig] 5l|6)|7 8 | 9 | wo | aj 12) 23 | 14) 15 | 16 | a7 | 18 | 19 





Central Differences, Central Differences, 
Second Method First Method 

Central Differences, Backward Differences, 
Second Method First Method 


By the “First Method ” is merely meant finding any differences from the table 
itself, and by the “Second Method” finding central differences from the formulae 


23—2 
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(7) and (8). It is open to question, whether for 7,, and 7,, the use of (9) might not 
be better, at any rate for the range h =3'8 to 40 inclusive. 


It may be desirable to give further illustration. Let us suppose we need to 
find +, (1°4765). 


Here 0 = 765, ¢ =°235. 
8°7, (1-4) = 0948,6833 (— 036,1788) + 0009,0830 (-034,5869), 
8°74 (1°5) = 0948,6833 (— -036,7973) + 0009,0830 (-030,4286), 
8'7,(1°4) = 0108,9954 (034,5869) + 0002,4507 (— -028,2655), 
87, (1°5) = 0108,9954 (030,4286) + 0002,4507 (— -020,6873). 
Hence : 67, (14) = — 003,4008, 87, (15) = — '003,4633, 
&'7,(1°4) = + 000,3701, 87, (1°5) = + 000,3266*. 


Accordingly : 
7 (14765) = "765 x (035,1482) + 235 (028,8707) 
es 235 61.765 (—-003,4633) +1235 (— -003,4008)} 
1° 235 
toa —— (1+4(-765 x -235)) {2°765 (+ 000,3266) 
+ 2:235 (+ 000,3701)} 
= -033,6729,9) 


+ 000,3089,9} = °033,9876. 
+ °000,0056,5 | 


We will compare this with the result obtainable from Professor Glover’s Table 


for 6” (14765) +. 
Clearly, 75 (14765) = —* 6 (14765) 


_ 6 (14765) 
200°79840637 ° 

Taking out of Glover's Table: 
$” (1:45) = — 651527 
(1:46) = — 663784 5h” (1°46) = + 00704, 
(147) =-675337| lego (1:47) = + 00705, 
p” (1°48) = — 686185 | , nad (1°48) =+ ‘00704, 
6 (1°49) = — 6°96329 &h" (1°49) = + 00703. 
” (1:50) = — 705770 

It is accordingly idle to proceed to 84d” (h). 

* The corresponding values found from the table itself are: —-003,4009, — -003,4634, +-000,3701 


and +-°000,3268, and the agreement is seen to be good, 
+ Loe. cit. pp. 397 and 399, 
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Substituting in the central difference formula (6), we have 6 =°65, d= ‘35 and: 
b” (14765) =°65 (— 686185) + °35 (— 6°75337) 
—} (65 x 35) {1°65 (00704) + 1°35 (:00705)} 
= — 6'824,683. 

Accordingly : Ts (14765) = + °033,9877, 
while the value found from our table is (033,9876, a difference of only a unit in the 
seventh decimal. Thus we can depend on Glover’s Table giving us the tetrachoric 
function correct to the sixth place*. On the other hand we find for @” (1:4765) as 
calculated from our seven-figure tetrachoric function the value — 6°824,656, which 
suggests that in an interpolated value Glover’s Table may give us an error of 3 in 
the fifth decimal place. For a great variety of statistical purposes this is ample, 
but it indicates that we are not introducing unnecessary figures when we sup- 
plement for many purposes our original five-figure tables of the tetrachoric functions 
by seven-figure tables. 


Those who choose to describe frequency not diverging too greatly from the 
normal by tetrachoric series, will find two fundamental formulae useful. The first 
gives the ordinate z, of the frequency curve in terms of the total frequency VN, 
the standard deviation o, for the character # and tetrachoric functions of h=a/o,. 
The second gives the area of the frequency curve from a given value of h = x/o, up 
to the end of the range, ie. h= 0. 

They are, if we represent this tail area by V}(1—a,), in accordance with the 
notation of the probability integral of the normal curve: 


N(_, 4 5 gol ae 
tne {n (i) + 55 VB. ra(h) + 8 (BoB) Fo(l)h oneenn (10), 
or: 2y= . (7, (hk) +°8164,9658 VB, 7,(h) + °4564,3546 (8, — 3) 7, (h)} 


avai (10) bis. 
Integrating : 


h =. (h _i1 — 5 _s ) 
[sadn = WV f(b) — 7 Bima h) — 75 (BB) rh, 


or: NE(1 +a) = N {7, (h) —4082,4829 VB, 7, (h) —°2041,2415 (8. — 3) 7,(h)} 


On the other hand, if h be negative, a, changes sign and the even tetrachoric 
functions change sign. Thus: 
r—h ? 
| z,dw = NE (1 — ay) = NV (1) (h) — 4082,4829 VB, 7, (h) 
hy + 2041,2415 (8,—3) 7,(h)}...(11) bis. 


* The divergence between the two tables is far from being always as great as this. Thus I found 
from the present table that 7, (1°4765)= — :017,8327, while Glover’s Table gave the value — -017,8326, 
a close accordance, but in this case the contributions of 6*79(1°4) and 6*79 (1°5) only give a total when 
combined with those of 57, (1°4) and 67, (1°5) of —-000,0080, so that little depends on the higher 
differences. 
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Here B, = p,2/ps*, Bo=my/m, where ps is the sth moment coefficient of the 
distribution, and V8, follows the sign of p;. 7 (h) is the $(1+ a), and 7,” (h) is 
the $(1—a) of the probability integral of the normal curve. The 7,(h) of our 


Table 
-h 1 “ jx? . 1 7 he? 
= socaien d == = dx 
[. x VQ ' . Zz V Qar , ‘ 


= }(1—-a) ifh be negative, =1—4(1+a) ifh be positive, 
as in the second integral. Accordingly : 
tT) (h)=1—7(h), 7)’ (h) = 7 (h). 


This artifice is adopted to avoid tabling 7, (h) throughout the range from — 2 to 
+2. The like difficulty does not occur with other even order tetrachorics, they 
are all odd in h, and merely change their sign with h. This point must be borne 
in mind when plotting z, from Equations (10) and (10) bis; the second term, i.e. the 
one in 7,(h), will be negative for h negative. 


It is clear that Equations (11) and (11) bis will provide the frequency of any 
cell of a distribution, by subtracting V4 (1+ a,) from V4(1 + ay), h’>h, if h and 
h’ correspond to the limits of the subrange of the cell. For applications of these 
equations, the reader is referred to pp. 289—290, 303—304 of the current number. 


I have to acknowledge a grant from the Government Grant Committee of the 
Royal Society, which has enabled Dr Alice Lee to devote her time to the calculation 
of this table and certain other tables shortly to be published. 
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352 Table of the Tetrachoric Functions to Seven Decimal Places 
Table of the First Twenty Tetrachoric Functions—(continued). 

h Ts TS TF Ts T) h 
0-0 +°109 2548 +000 0000 — *084 2919 —°000 0000 + °069 5373 0-0 
0-1 +°106 5394 + 022 0425+ — ‘081 3638 — ‘020 5500- +°066 4367 Or1 
0-2 + °098 5813 + °042 5587 — ‘072 8400 — 039 2734 +°057 4717 O°2 
0-3 +085 9288 +°060 1576 — °059 4743 — 054 5416 +°043 6096 0:3 
Or4 +069 4420 + ‘073 7045- — 042 4326 — ‘065 0959 + 026 3256 Or4 
0-5 | +°0502172 | +-0824144 | —-0231686 | -—-0701742 | +-007 4174 | 0-5 
06 | +°0294944 | 4-085 9085+ | -—-003 2732 | -—-0695744 | --0112147 | 0-6 
Ory | +008 5543 | 4-084 2295+ | +-015 6853 | --0636520 | -—-0277918 | 0-7 
0-8 | —-011 3820 | +-0778153 | +-032 3105+ | —-053 2523 | --0408554 | 0:8 
0-9 — *029 2429 + °067 4365 + °045 5011 — 039 5911 — 049 4138 og 
1:0 | -—-0441776 | +4:0541063 | +:0545340 | --0241009 | -—-0530219 | 1-0 
11 — +055 6023 | +-0389747 | +:0591023 | --008 2639 | —-0517870 | s-1 
1-2 | —-063 2204 | +-023 2182 | +-0593064 | 4-006 5456 | — -046 3071 1:2 
13 | —-067 0162 | +-0079380 | +°0556045* | +-0191923 | -—-0875547 | 1:3 
1*4 — 067 225€ — 005 9246 +°048 7307 + °028 8707 — °026 7277 1*4 
15 | —-064 2891 —-017 6481 | +°0395946 | +-035 1482 | —-015 0898 15 
1°6 —°058 7935* — ‘026 7631 + °029 1754 + °037 9623 — °003 8219 1°6 
17 | —-051 4089 | —-033.0572 | +:018 4223 | +-0375773 | +-0060962 | 1:7 
18 | —-0428277 | —-036 5561 +008 1718 | +-0345106 | +4-0139649 | 1°8 
1:9 | --0337104 | --037 4849 | -—-0009110 | +-029 4428 | +-019 3986 19 
2°0 — 024 6434 — 036 2182 — 008 3656 +023 1238 + °022 3172 20 
21 —-016 1083 | —-033 2243 | —-013 9432 | +-016 2865- | +4-0229031 | 2-1 
2:2 | --008 4664 | --0290109 | -—-0175912 | +4-0095777 | 4-021 5356 | 2-2 
23 | —-001 9547 ~ 024 0766 | —-019 4221 +°003 5107 | 4-018 7140 | 23 
2-4 | +003 3069 | -—-018 8733 | --019 6716 ~-001 5596 | +4:0149806 | 2:4 
25 | +°0073005- | --0137793 | —-018 6527 —-005 4388 | +°0108554 | 25 
26 | +:0100902 | --009 0845+ | -—-016 7122 | --0080787 | +-006 7853 | 26 
2-7 | +:0118000 | --0049870 | —-014 1931 —-009 5502 | 4-003 1136 | 27 
28 +°012 5914 — ‘001 5978 — ‘O11 4054 — 010 0097 +000 0666 28 
2°9 +012 6436 + ‘001 0474 — ‘008 6067 — ‘009 6643 — 002 2420 29 
30 | +°0121371 | +-0029730 | —-0059930 | --008 7402 | --003 7962 | 3-0 
31 | +°011 2405- | +-004 2467 | —-003 6964 | --007 4562 | -—-0046554 | 4-1 
32 | +010 1022 | +-0049635+ | --001 7907 | -—-0060056 | —-0049287 | 4:2 
33 +7008 8455+ | +005 2310 — ‘000 3000 | — 004 5441 — 004 7510 3°83 
34 | +007 5673 | +-0051577 | +-0007897 | --0031860 | —-004 2623 | 3-4 
35 | +006 3383 | +4-0048449 | +-0015191 | —-0020048 | - -003 5921 35 
36 +°005 2061 +004 3807 +001 9441 | —-001 0379 — “002 8493 36 
37 | +004 1986 | +-003 8375- | +-0021273 | -—-0002940 | —-0021175* | 37 
3:8 | +-003 3280 | +4-0032709 | +-002 1303 | +000 2395+ | —-001 4540 | 3:8 
39 | +002 5948 | +4-0027212 | +-0020092 | +-0005887 | —-000 8923 39 
40 | +001 9914 | 4-002 2145- | 4-001 8116 | +°000 7865- | --000 4459 | 4-0 
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Tetrachoric Functions—(continued). 
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Table of the Tetrachoric Functions to Seven Decimal Places 


Table of the First Twenty Tetrachoric Functions—(continued). 
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MODERN PROBLEMS IN VITAL STATISTICS*. 
By ProrEssor HARALD WESTERGAARD, Copenhagen. 


It is of course impossible in a single lecture to enter into details with regard 
to all the problems of Vital Statistics. I shall confine myself to some general 
remarks, giving as I hope a comprehensive view of the subject with some addi- 
tional observations on certain fields, which—as far as I can see—might be cultivated 
with good results. 





The construction of life tables being the first step forward in the evolution of 
political arithmetic it may be fitting to begin with this problem. In fact the life- 
table formulae can easily be generalised so as to embrace other problems in vital 
statistics. 


Using the continuous method, which can be traced as far back as Daniel 
Bernoulli and Duvillard, we can express the number of persons between.the ages 
of # and «+(e being infinitely small) at the moment ¢ as ep, ;, p being a function 
dependent on w and ¢. If the observations are collected by a life imsurance society 
among assured lives it will present very little difficulty to find the value of p,, 
with great accuracy at any given moment. The problem is somewhat more difficult 
if we have to deal with census results for a whole population, with long intervals— 
for instance 10 years—between the enumerations. We are then compelled to use 
more or less refined methods of interpolation. The differences between the results 
of these various methods are however as a rule not very great, so that we can use 
any of these methods for all practical purposes without much hesitation. The 
problem will be to find the variation of the value of p,, if the time, as well as the 
age, gets a small increase e, or in other words to find the first differential coefficient: 

Ape,e , Up, 
da dt ° 

Between the two moments ¢, and ¢, the group in question has embraced the 

number of individuals 


4. 





rt, 
€| pzx,rdt. 
! t, 


And dividing this quantity into the number of deaths observed during the 
same period, viz. ed,, we find the instantaneous rate of mortality (the force of 
mortality) to be 

dy, 


[peat 


* Being a lecture delivered in the University of London to advanced students of statistics, 


Mea = 




















Modern Problems in Vital Statistics 


Having found yw, we can easily develop various formulae which are useful in 
mortality statistics. Thus (/, being the number surviving out of a certain number 
of individuals of the same age) we shall have: 

1 dl, 
eee = l, de ’ 


Ly 
= | Md, 
or Ly, = ly,e ad 


and if «, can be considered as constant in the interval a, to #, we shall have 


? 


’ =_ a @7 (aH) M, 
If the interval is unity this equation is reduced to the following : 
i ae pe 
loys = be (1 -f+fs-r eat): 
And the probability of a person aged #, dying within a year will be: 


With approximation the expression on the right side of this equation may be 
replaced by the following* : 


the difference being less than aa: The probability of dying within unity of time 


will thus be a trifle smaller than the force of mortality. 

Again we can resolve ~ into a sum of quantities each one representing the 
frequency of deaths from various causes, say tuberculosis, alcoholism, ete. If for 
instance w= w’ +p” we can calculate fictitious life tables putting 

Cen=l,,e and 1",4,=1",,e*", 
and we easily find 
Vast oni Vat Ut 
l,., i. 
If one of the causes of death disappears through some factor of sanitary or social 
progress (as for instance Jenner’s Vaccination against smallpox a century ago) 
this formula will give us the means to judge of the effect on the life table. 

Frequently we desire to measure such influence by means of the ewpectation of 
life, viz. the average number of years which a person of a given age will attain 
according to the life table. At the age of «,, the value of the expectation of life 
will be 


vy 


Ras | "kde. 


rv 2 


where is the highest age attainable according to the life table, for instance 100 
years. 


* [Still closer by u/(1+ 44)", the difference being then less than 5}; 44. Ep.) 
y 2 
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If the age-distribution of a population corresponds to that of a life table, as for 
instance according to Halley’s supposition, we can easily prove that the expecta- 
tion of life is identical with the number of persons out of whom one will die in 


unity of time. In a moment of length e the number of deaths out of | L,.da will 
a, 


be J,,¢, and consequently in unity of time 1,, will die. This problem puzzled our 
forefathers in the 18th century. 

If for instance the rate of mortality of the whole population was 3 per cent. 
yearly, the expectation of life for a new-born child could be estimated at 33} 
years. But at the same time the birth-rate was perhaps 4 per cent., the popula- 
tion thus not being constant; they therefore concluded that the expectation of 
life would lie somewhere between 331 and 25 years, for instance : 

© 
me = 28°6 or $ (334 + 25) = 29°2. 

By changing the terms we can easily adapt the preceding formulae to other 
branches of vital statistics, asking for instance how a group of bachelors will be 
gradually reduced by ‘marriage and mortality, or—in criminal statisties—what is 
the probability that a person of a given age will never be punished for a crime. 

Difficulties will rarely meet us with regard to mathematical formulae, the 
great obstacles as a rule lying in the observations. If these on the whole are at 
hand they will frequently be more or less defective, and it will be the task of the 
statistician to find his way in spite of these defects. Such difficulties for instance 
appear in marriage statistics, where the observations frequently are inaccurate 
because the weddings registered do not correspond exactly to the population 
enumerated within the district. It is relatively easy to get tolerably good obser- 
vations on deaths, but it is otherwise with regard to invalidity and infirmity. We 
can calculate the surplus emigration from a country, adding perhaps some partial 
observations on ages of emigrants to America or Australasia, but it is most difficult 
to get exhaustive statistical observations on migrations within the country. 


Whatever subject we have to deal with in the first instance the problem will 
be to find the value of a fraction, i.e. the rate of frequency of a certain event. 

The problem can easily be solved if we have sufficiently accurate observations 
both on the numerator and the denominator of the fraction. But frequently the 
denominator is unknown. We are perhaps possessed of accurate accounts of the 
distribution of births, deaths and marriages in the single months of the calendar 
year, whereas the corresponding group of the population is unknown. Still we are 
not altogether excluded from drawing conclusions if we feel justified in supposing 
that the population concerned does not vary too much within the space of time 
concerned. As a rule we may without much hesitation make investigations as to 
the influence of season in vital statistics without taking the population into con- 
sideration or without noticing the secular trend of the movements of mortality or 
fertility ; on the other hand in Economic Statistics we must be much more careful, 
the movements being so conspicuous that we cannot compare January and December 
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in the same calendar year without taking these movements into consideration. 
As problems of this kind in vital statistics I may mention the prevalence of certain 
diseases in months which have particularly high rates of infant mortality. Without 
knowing the number of infants exposed to risk we may ask whether an excessive 
heat in August or September will prove fatal to babies under 1 year, ete. 


It is not uncommon to meet proposals for dealing with one-sided material, as for 
instance the distribution of deaths according to causes. The well known Hungarian 
statistician Kérésy proposed for instance to compare the risk of dying from certain 
causes of death in various classes of society by calculating the ratio of the number 
of these cases to the total number of deaths in the class concerned. If in one class 
of society tuberculosis caused 30 deaths out of 100, in another one only 20, then 
he felt justified in concluding that tuberculosis was more dangerous in the former 
group than in the latter. Kérésy was well aware that the rate of mortality 
from all causes might be as 2 to 3 in the two classes of society; this being the 
case, the frequency of death from tuberculosis would be exactly the same. Possibly 
even the real death rate from tuberculosis might be smaller in the former class 
than in the latter. His idea was however that if the rates of mortality from all 
causes were as 2 to 3, then normally the same ratio ought to hold for the various 
causes of death, and any excess above what he thus considered as normal should 
be interpreted as testifying to an increased influence of the disease concerned. 


Corresponding proposals have been recently made with the view to construct- 
ing mortality tables from observations on causes of death upon the supposition 
that we know the frequency distribution according to some formula. But even if 
a mortality table calculated in this way may seem remarkably trustworthy we 
have evidently no security that the rates of mortality which we find provide the 
true values. These defective observations cannot possibly replace complete obser- 
vations on the denominator as well as on the numerator. 


Turning to the opposite case: the denominator being known, whereas we have 
no observations on the numerator, we shall find somewhat better chances of reach- 
ing tolerably safe conclusions. In fact the census gives us the results of the 
movements of the population. Knowing for instance the distribution of the 
population above a certain age between bachelors, widowers, etc. we have some 
evidence with regard to the chances of entering into married life. An example 
may illustrate this side of the problem. 


Let us from the Census calculate the relative numbers of bachelors at each 
age. Thus according to the Danish Census 1911 among 1000 males aged 40—45, 
115 were still bachelors. In the following quinquennial age the quota was 
reduced to 96, and in the ages 50—55, 85 out of 1000 were still unmarried. The 
question is whether we can use such numbers to find the rate of marriage of 
bachelors at various ages. 


Let the number of bachelors aged « to «+e be eb,, and let the rate of 
marriage be m,, whereas 6,’ signifies the rate of decrease through mortality and 
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migration of bachelors. Then we shall have as the result of the changes through 
the small element of time: 
db... . dbs 
de * dt 


whereas for the whole population we have : 








=~ Dat (6, =F Mz), 


dp, +. Uxt 
eats Mie =—p,,18,, 
dx dt 
where 6 in the same way represents the decrease on account of mortality and 
migration. 





, b , i 
Lastly we can take the ratio r,,,= fn ‘ and we shall find by differentiation : 
a, t 


= Y ogee (6, — 5, + mz). 


Generally it will not be difficult to calculate the values on the left side of the 
equation with sufficient approximation. Knowing, for instance, the relative numbers 
at two consecutive censuses with an interval, say, 10 years, it is a matter of inter- 
polation to find the variation according to time and age. Thus the rate in 1901 
at the ages 20—25 being 881 and 10 years later 876 we may estimate the yearly 
rate of decrease according to time at 00006. 

It is a well known fact that mortality among bachelors is higher than in the 
remaining population. Probably also migrations are more prevalent. Thus we 
have 6, >6,. If we therefore leave 6,’—6, out of consideration, thus putting 

4 Mans 
da * dt ~ 
we shall find a value of m, which is somewhat too high. But the difference 
between this value and the true one will as a rule be comparatively small. In the 
period of life in which most marriages take place the rate of marriage will be 
10 per cent. or more, whereas the force of mortality and probably often the rate of 
migration will be only a few per mille. This being the case the difference 6,’ —6, 
cannot count for much, so that the values calculated from the equation above will 
on the whole be fairly correct. In fact the results are in good harmony with the 
values found directly by comparing the numbers of marriages to the enumerated 
numbers of bachelors. 


—1y,t My, 


It would of course be superfluous to make calculations of this kind if we can 
get direct observations. But the method may be of use in retrospective calcula- 
tions concerning census reports for remote periods. Thus we are possessed of a 
highly interesting Danish Census for 1787 giving the structure of the society in 
those times—much more simple indeed than now-a-days. Using the method 
which I have described we find the rate of marriage for males aged 25 to be only 
4 per cent. whereas it was about 12 per cent. in 1911—20. But at 35 it was in 
1787 14 per cent. against 10 now-a-days, and at 45 we find 13 per cent. against 
only 3 per cent. at present. These results show a striking contrast to modern 
experience. 
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Another curious problem is presented by the religious statistics. In Denmark 
every census gives observations on the various confessions. An increasing number 
of both sexes declare themselves as not belonging to any church. Taking 1901 as 
basis we may ask how many persons should be expected to be standing outside 
any confession 10 years later in 1911. Let the rate of withdrawals over and above 
the rate of re-entrance be ,,, then in a given element of time e we shall have 
o,,.¢ surplus withdrawals. If the number of persons at the first census be p,,:, we 
shall have as the corresponding number 10 years later pyy4io,t410 and after m years 
Pe+n,t+n» The number of withdrawals between ¢+n and t+n+e will be: 


De+n, tn + Px+n, ten € 
and that number will probably be reduced to 
Dy+n,tin + Px+10, t+10 € 
at the following census. For the whole interval there will thus be a gain of 


10 
Pato, tem, Oxin,ton UN, 
or approximately : 
LOP 410, +10 Ox, 49° 

The number of males belonging to some confession at the ages of 25—30 was in 
1911 about 98,000. The surplus of persons not belonging to a confession was 
1050. The rate of withdrawals for the ages 20—25 will thus probably be 0-0011. 
For 30—35 we find 0:0006. Above 40 the rate of withdrawals is reduced to a 
very small value. 

Just the same method may be applied to the statistics of migrations. Suppose 
as usual 

5,= ba + te, 

where yw, is the force of mortality and 7, the surplus rate of migration (supposed 
constant with regard to ¢ within the interval), then we shall have: 


10 
| 64, dn 
Pa+io, t+10 = Px, t -@ ® , 
or approximately 
e~ 105,45 — Pa-+10, t-+-10 : 
Pa,t 
This equation will give us 6, and subtracting uw, we find é,. 
The main results for Denmark in respect to males are the following for the 
years 1901—-11: 


Age 


Force of mortality Rate of net- 


Pu emigration p,, 
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After 35 the rate of emigration is very small. It is principally the young men 
who leave the country. As to the females emigration is much smaller and the 
influence of age is less conspicuous. 


These results can again be used in other domains where direct observations 
are lacking and where it will therefore be reasonable to try indirect methods. Let 
us endeavour to find probable rates of mortality for the feeble-minded, using the 
Danish Census of 1911 as base. As usual we have to find the value of 

Va, t (pe = to — Py = = Mz), 
where m, is the vate of fresh cases of feeble-mindedness. This quantity seems 
however to be very small. Most of the persons registered as feeble-minded seem to 
be born as such. We can thus leave m, out of consideration, the result probably 
being that the rate of mortality of the feeble-minded is underestimated a little. 
We can also probably consider 7,', the rate of emigration of feeble-minded, as 
a negligible quantity so that we have only to deal with 


Mx — Pas tg. 
The results will be the following rates of mortality for male idiots: 


| 





— Idiots | Normal 

| 6 Pm | persons p,, | 

| ~ BS 

| - | 

| 20—25 30 4 

| 25—30 | 24 4 

| 30—35 26 5 
35—40 28 6 | 
4O—45 30 7 
45—5O 36 10 
0—55 | 40 13 | 
55—60 | 40 19 
60—65 | 59 27 


These minimum values are in good harmony with an investigation which I 
made several years ago concerning mortality in asylums for the feeble-minded, 
basing my conclusiens on direct observations on deaths and on numbers exposed 
to risk. 


The table testifies to a very high mortality among the feeble-minded especially 
in earlier years of life so that the rates of mortality of the feeble-minded remain 
nearly constant through a long period of life. 

Studying statistical observations on other defects we find curious results with 
regard to blindness, but here it is the rate of the incidence of blindness and not the 
mortality of the blind which we are approximating to. Supposing that few blind 
persons recover and that the mortality of blind persons will differ little from that 
of the general population, and confining ourselves only to the advanced years of 
age where migrations are trifling, we can try to estimate the chances of becoming 
blind in various ages, the expression pz’ + tz’ — #,—t,— mz, Where m, is the rate of 
becoming blind, being reduced to —m,. We conclude from the observations that 


Biometrika xvi 24 
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out of 100,000 males at the age of 55-60 only four will become blind yearly, in the 
following quinquennium of age 6, etc. 

For sociological purposes however it may often be considered sufficient to use 
still simpler methods to ascertain the influence of migrations and other displace- 
ments of population. We may confine ourselves to finding the gain or loss from 
one census to another. 


Taking as instance the numbers of Roman Catholics in Denmark, 1901 and 1911, 
we find that 623 females aged 5-15 years belonged to this church in 1901. 
According to the age-distribution 10 years later we should expect to find 591 in 
1911 in the age group 15-25. The number observed was however 1265 so that 
the gain was 674. 

The above quoted census of 1787 gives interesting evidence as to the displace- 
ments between various classes of society, in the younger years for instance from the 
group of masters to that of journeymen, in mature age in the opposite direction. 
In the rural districts the class of servants gains large numbers from the peasants, 
the small-holders and others, till in the age group 30-40 the tide is turning. As 
one of the burning questions of the day which may be treated by this elementary 
method I can mention migrations within a country from the rural districts to the 
towns. 


In conclusion I may refer to the complicated problem of birth statistics. First 
of all we have to find the birth-rates under various circumstances, for instance 
according to conjugal condition, age of mother, etc. If sufficient observations are 
at hand we can calculate tables corresponding to mortality tables, comparing the 
results for various epochs in order to follow the decrease of fertility which has 
been so conspicuous in the last decennia. Observations have been obtained in 
various countries, in Scandinavia, New Zealand and Australia. In France we can 
follow the movement back until 1896, in Denmark even as far as 1870. Behind these 
problems other important questions arise, for instance to find the probability of a 
wife at a certain age bearing a child when we take into consideration the duration 
of her marriage and the number of children which she has borne previously. Further 
the influence of the size of the family or of the rapidity with which births follow 
each other, on the infant mortality. Several of these problems may be treated 
with advantage by means of privately collected observations such as Ch. Ansell’s 
Family Statistics (1874). In North America useful observations of this kind have 
been collected of late. But on the whole the field is not yet exhausted. 

If we use Census results only we shall have difficulty in applying the indirect 
methods of finding the birth-rates just described, the decrease of fertility com- 
plicating the problem. But it will not be difficult to draw indirect conclusions of a 
more elementary and preliminary character from Census reports. A Danish Census 
of 1880 in Copenhagen provided material of this kind with particulars as to the 
number of children alive or dead at the moment of the Census. These observations 
gave evidence of a considerable difference between various classes of society, and 
just the same was the case in a later Census for the whole country. Important 
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conclusions could be drawn concerning the vitality of the children born in families 
of various size testifying to an enormous harvest of death in marriages with a very 
numerous offspring. It is unnecessary to add that investigations of this kind will 
necessitate some standardization of the observations in order to outbalance the 
effect of disturbing causes, as for instance of different age-distributions in various 
classes of society. Similar observations were made later on in other countries ; 
thus in England by the Census of 1911, Professor Pearson having previously drawn 
attention to these problems. In Norway the Census of 1920 gave observations 
of a similar kind. It is to be hoped that several countries will follow Norway 
and that on the whole birth statistics will be recognized as one of the most fertile 
subjects for both modern statistics and sociology. 


Generally I hope it will flow from what I have said that our statisticians of 
to-day need not fear that they will be thrown out of work. On the contrary they 
are likely to suffer from a cumbersome embarras de richesses. 


24—2 














ON THE EXTREME INDIVIDUALS AND THE RANGE OF 
SAMPLES TAKEN FROM A NORMAL POPULATION. 


By L. H. C. TIPPETT, B.Sc. Lond. 


THE problem of the range of samples arises as a special case of Galton’s 
Difference Problem, first given by Professor K. Pearson in 1902 (1). Together 
with the allied problem of the extreme individuals, it has engaged the attention 
of other writers (2) (3) (4), but a complete solution has not yet been given. This 
would involve the determination of the distribution of the range and of the 
extremes for a large number of samples. Attempts are here made in some measure 
to supply this deficiency. 


I. Tue First or Larcest INDIVIDUAL. 


Let us consider samples of size n to be taken from an infinite population 
represented by the curve y= ¢(«), which we will suppose extends to infinity in 
each direction, and let 

te, + 
@,* = | $(x)dx, where $b («) dx =1. 
Js —@ J -@D 
Then the chance that the n individuals of the sample shall be smaller than «, 
equals a,”, and this is the chance that the largest shall lie between # = — o and 
L= Ly. 


Hence, if y = f («) be the distribution of the largest individual in samples of x 


(where 





D 
Sf (a) da= 1), we have for the probability integral of this curve 
D 


This has been determined for samples from a normal population for several 
values of n, and some of the results are given in Table IX at the end of the papert. 
The values of the variate are all in terms of the standard deviation of the original 
population. The constants given in Table I (the mean, standard deviation, 8, and 
8.) have been found from the grouped form of the distribution, using Sheppard’s 
corrections, and are also plotted in Diagrams I, II and III. The mean is half the 
mean range, which is given in Diagram V. Since the normal curve is symmetrice!, 
the distribution of the smallest individual is quite similar, excepting that the 
variate has the opposite sign. 

* This a must not be confused with the a of Dr Sheppard’s Tables of the Probability Integral; 
throughout this paper, the a used is equivalent to 4 (1+) of those tables. 

{ Full tables are in manuscript, and it is hoped that they will be published later. 
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Diagram III. 
TABLE I. 
Constants of the Distribution of the Largest Individual in 
Samples from a Normal Population. 
Size of Standard 
Sample n Mean Deviation Bi Bs 
2 56419 *8257 “O19 3°062 | 
5 1°16297 “6690 “092 3°202 
10 1°53875 “5868 "168 3331 | 
20 L*S674A7 "5251 °251 3°469 
60 2°31928 “4545 “376 677 
100 2°*DOT59 "4294 ‘429 3° 765 
200 2°74604 “4009 “495 x875 
500 3°03670 “3704 *D7O 1-003 
1000 3°24144 “3514 “618 1-O8S8 


| 
It is interesting to note that we have here an example of a statistical variate 
the distribution of which diverges more from normality as the sample from which 
it is measured increases in size, at least this is so to the limit n = 1000. 
The tables of the distribution of the extremes may well be used for deciding 
whether or not any outlying member of a sample should be rejected. The deviation 
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of this individual from the mean is measured in terms of the standard deviation, 
and from Table IX, the chance of a deviation occurring as great or greater 
than this is found. This chance is (1—a,), where a, is the value read directly 
from the table, and if it falls below a certain limiting value, the individual is 
rejected. In Table II are given values of the deviations of the extreme variate 
corresponding to limiting chances of 1% and 5 °%. These are plotted on Dia- 
gram IV, which can be used when the samples are of intermediate size. 
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Diagram IV. 
TABLE II. 
Values of the Extreme Variate occurring with Limiting 
Probabilities “Vv AS and 1 “le in sass tte o Size n. 


n | 8 per cent. 1 per cent. 
; ma 
10 2°5695 3°0924 | 
20 2°7992 3°2893 
50 3°0831 3°5393 
100. = 32838 3°7180 | 
200 3°4740 3°8894 | 
500 3°7126 41063 | 


1000 3°8845 4°2638 
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II. On THE THEORETICAL RANGE. 


The range is defined as the difference in character between the largest and 
smallest individuals of a sample. It has not been found possible to write down 
the distribution of this in any useful form, so the procedure has been to find those - 
constants involving the first four moments, so that an appropriate Pearson Curve 
can be fitted. It is usually found that curves obtained in this way give good fits 
to actual data, so the method will be adequate for practical purposes. 


(i) The Mean. t 


Professor Pearson in his paper (1) gave an expression for the mean difference 
between the pth and(p + 1)th individuals 


n! Bete ae s 
Xp = (n—p)!p! ‘e OE GF nnn cssencrecearcenns (2), 


from which, by summing for all values of p from 1 to n—1, we obtain for the 
mean range 


W= . ee Oe ae gf | eee (3) 











+ co 


This relation may be arrived at in a slightly different way*. 


Let the figure represent the curve of distribution of the original population 
y = (a), and, as before, let 


Oy = i (ax) dx, where :. $ (x) dx =1. 


Let x, be the character of the first individual, and x, that of the last in a sample 
of size n. 


Then, if we suppose the original population to be infinite, the chance that we 
have one individual at #,, one at z, and n —2 between 


n! ” 
“@->! ee ge ETO (4). 
Whence we have for the mean range 1 
n! L=+oO fXp=2Xy ) 
w= in — 2)! | | (@, — Gy)? (a, — tq) dey tty —secveeees (5). 
L #)- J a=-w J an=-@ 
s=n—2 (n — 2)! 
— —2 oe a eee ere N—2-8 py 8 
But (a, — a) LJ (-—1) Ce Ey a, a,'. 


* This proof was suggested by Prof. Pearson. Mr J. O. Irwin gives yet another proof in (5). 
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Therefore 


23 s=n—-2 (- 1) 2, =+0 tn=2, 
w=n! S§ : af a, , da, | An (7, — #,) da, ...(6). 
-J@=-@ y= —-w 


s=0 8! (n—2— 
Now 


Pty =X, 


a= 


(a —2“,)a 8+1] an =2, 1 v 
An (a, — Ly) day, = |" 2) Gn + a," da, 
our s+1 Gee See 
The term in square brackets vanishes at both limits, so that we obtain, by 
substituting in (6), 


ae (—1)§ . sda a—2-* Ud 
w=n! - - = ~~ ' Qns 
ono (8+1)!n—2—as)l Jeno 


/ Un- 


x, 
where U= a)?! day. 


1 1 nan nu—1—s 
Let 0 = | ali da, = -& : 
Then, w=-n! S 
; RE es OO doe 
Integrating by parts, we obtain 


s=n—-2 (- 1) 


w=n! 8S 


%,=t+o +x dU " 
s=0 (8+1)!(n—2—s)! - as = [ i r da, dt. 


Again the term in square brackets vanishes, and 


s=n-2 (-1) eae dé 


M=—w da, 


so that we have 
s=n-2 n 
j= VS — 1. as 
e 5. -*eehecice 


leading to the result of equation (3), 


0 
| (1 —a,""*) a** da, 


w= [- {1—(l-—a)"— a”? dx. 


(ii) The Higher Moments. 

In a similar manner, it is possible to obtain an expression for the higher 
moments, 

From (6) we see that the mean value of the mth moment (taken about the 
mean) 


s=n-2 (- 1) 2, = -+ oo Tn =X, 
fu = 0! S . | a," da, | “* (ay — £, — wy" da,, Ber () 8 
> a wD 


s-9 S!(n—2-—s)! bie 


Now 
n= X, , 
a,* (a, — iy wy" day — 
D 


x«,=- 


(a, —X_— ww)" ae?) [ =2%, 


Lpj=-9 § a 1 m=—- wD 
m n= 2X, (— was m 
+ (a, ee wy" q,3t da, => : + ‘ V (say). 
8+1J,y,.-« s+l s+1 
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On substituting in (8), we obtain 
o 


s=n-2 n! ray = +0 
= S —1) : — Fyn a” d 
™ a 0 ( ) Gate recat ” Pas : 


f°2,= +2 
+m | ee Vida}. * 
M=-@ 
1,=+o@ rl — n- 
Let J= | a;”-2-* Vda,, and as before, let 6= | a,"—2-§ da, = ——__—_ , 
ee ei n—l—s 
°L,= +0 dé %=+e +a d V 
Theo = ¥ ees r ‘ 
= 1 mye 
be | Seal Vde,=—[0V] + {05 ~ da 
| = -@ la, “‘Q=-w } or da, 
The term in square brackets vanishes at both limits, and 
dV Mm—1 gy s+ i m—2 s+1 
a = (— wy)" aot + (m — 1) (@, — % — Ww)" * a,*" dz,, 
a — 2 
Pr +o 2 — n—s—1 2 s+1 
so that J =(— 7)" | ( : )- da, 
wis n—s—l 
me l | 7 (l— oe) a," (7, —Xyn—- wy" dax,dx,, 
ne J/—-@ —-@ 
2, +o 1 
and | a,"da,= 
wi me =e n 
Substituting in (8), we obtain 
s=n—2 (n—-1)! 
m — S —1) 7~ aw) 
fue © A-)) oa iie-s-a | 
( we 1 n! ee n—s 1) +1 1 
+(-wy" 8 (-1Y —— > ™ —a,""*") a?" da, 
”) a vietill@-s-i) h oer ; : 
+m(m—1) S (-1) wa ‘Ss 4 (1 —a,""*") a,*" 
naib (s+1)!(n—s—I1)!J-2 J -2 » 
x (an, — ay — 16)"-* dan dary. 
If n is even, the first term reduces to (— #)". | 
The second term = mié (— #)"—" = — m (— WY". 
The third term becomes 
f+o fz, 
m(m—1) {1 —a,"—(1—a,)" + (a, —a,)"| (a, — a, — BW)" dadz,,. 
Hence, for even values of n, we have 
hn =m (m = »)| ii {1 7 a," — ( ]— a,)" + (a, = a,)"| (a, == wy" da, day, | 
ig ~(m—1)(—@)™......(9), 
and, on putting m = 2, 
f+o (2, 
f_=o= 2 | | {l—a,"—(1 —a,)”" —(a, — a,)"} da, day —UW  ...cccceeeee (10). 
-«xJ/-w 


Similarly by putting m=3 and m=4, the formulae for the third and fourth 
moments can be written down. 
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Ill. On THE RANGE (RESULTS FOR A NORMAL CURVE). 
(i) The Mean. 


The mean range for a normal distribution has been found for all samples from 
two to one thousand, and is tabled at the end of the paper. A framework of values 
was found by direct computation of equation (3), using quadratures, and this was 
filled in by interpolation, using first Lagrangian Formulae, and finally a difference 
formula. The accuracy of the quadrature was checked by seeing that the full 
number of ordirates gave sensibly the same result as half the number. Here again, 
as indeed throughout the paper, the values are for a population having a unit 
standard deviation, and in any given case must be multiplied by the actual value 
of this quantity to obtain the absolute range. Diagram V illustrates the resu.ts 
graphically. 
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Diagram V. 


The determinations of Bortkiewicz (2) and Dodd (3) are not always in good 
agreement with those of this paper; but they are admittedly approximations. By 
one method however, Bortkiewicz does obtain good values, as may be seen from 


Table ITI. 
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TABLE IT. 


Comparison of Values of Mean Range. 
Size of Sample | Bortkiewicz | True Mean 
| 


8 2°83 2°8472 
24 3°90 3°8953 
48 4:48 44662 | 

100 5°03 5°O152 
450 6-01 6:0090 
1000 6°48 6°4829 


(ii) Higher Moments. 

The second moment, and hence the standard deviation of the distribution, was 
determined in several cases by means of equation (10). The work is very laborious, 
as it involves cubature, and even so, the result can only be given to a few figures. 
It is believed that the values given in Table IV are correct to the last figure. 
From the curve of Diagram VI, the standard deviation can be read off for samples 
of intermediate size. 

Much time was spent in trying to evaluate the third and fourth moments from 
equation (9) by the same method, but there are many difficulties, and the results 
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Diagram VI, 
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were most irregular. It will be seen that the formula (9) consists of two parts, 
which are subtracted from one another. These are nearly equal, so that they must 
separately be determined very accurately if their difference (giving the required 
moment) is to be reed upon. For example, in determining the fourth moment 
for n = 200, it was found that the term integrated was about 2729°8342, and the 
other term (3) 2729-4203, giving the fourth moment equal to *4140. In order 
that this should be correct to three figures, the integrated term would have to be 
found correct to seven figures, an accuracy which is very difficult to obtain by any 
cubature formula with a reasonable expenditure of labour. In this case about 
1500 ordinates were calculated, and when they were summed twice by means of 
the Euler-Maclaurin quadrature formula, the value 2729°8342 was obtained, while 
Weddle’s Rule gave 2729°7157 for the integrated term. These two give widely 
different, and hence wholly unreliable, values for the fourth moment. The dis- 
crepancies are «ccounted for by the fact that insufficient ordinates were taken ; as 
far as the number of figures used is concerned, the result should be correct to 
three decimal places. The difficulty was more acute for the larger samples, and for 
the higher moments, but in all cases it preveuced the determination of reliable 
values of pw; and w,*. Consequently, a method of obtaining these constants from 
those of the separate distributions of the first and last individuals was resorted to. 


Let us consider a system of variates u, v and w, measured from their respective 
Vv ’ 
means, and let w=u—v. Then, quite generally, we have 


w= ui —v=), 


whe = up, — 27 V uble + vba + fle | 
whts = ufls— Sf + Bpro— vila Pewee eeneeees ree | 
whts = upts— 4p + Gp — Apis + vfs 


where p,s is the mean value of the product u’v', Now, if w be the range, and 
wu and v be the first and last individuals respectively, of a sample, then 


ee Oy ub2 = v2» 
Pe = — Pa, ubls = — vs, 
Pa = Piss ubs = v4, 


since the samples are taken from a distribution symmetrical about its origin, and 
we have for the moments of the distribution of the range in terms of those of the 
largest variate 

ow = wh = 2 uple 4 r) ) 


wh; = Qufts _ Opa f cocccccccceceesccecooces qd 1) bis. 


why = Zupls — Spa + 6p» 


* As an example, some of the actual values of the §’s obtained by cubature are here given : 


500 


20 | 60 100 200 
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Moreover, if we assume that the regression between u and v is linear, and that 
the correlation surface of these variates is homoscedastic, we can readily obtain 


Pa=r V uP ou | 
Pu =T ube o,,! Er anascibealscan Wiscioroie Se amnORO +++-(12). 
Pp2= {l 7 (1B. ii 1)} oy | 


From the first relation of (11) bis we can find r, since o,,? and o,2 have both been 
found, and can use this, and the known constants of the distribution of the largest 
variate in (12), to find the cross-product, or corrective terms of equations (11) dis. 
In this way we can find ,p, and ,,p. 

The legitimacy of these assumptions is, however, doubtful, because taken 
together they amount to an assumption of normality in the distribution of the 
u, v surface, which is far from correct. Mr E. 8S. Pearson (Biometrika, Vol. XVI. 
p- 196) has shown that equations like (12), while giving better values for px, prs 
and p. than the assumption of normality, may lead to worse results than that 
assumption when we substitute such approximations in equations like (11) where 
the higher products have alternate signs. 

In the case of a sample of two we know the actual form of the surface, i.e. it is 

1 


1 is 2 52 ° . . 
the part of the normal surface z= —e~*"*+™) which lies above the diagonal 
T 


w=v, and this permits us to calculate the constants of the marginal totals 
which are: 
u=—v = 564,190, 
Oy = 0) = 825,645, 4B, = 8, = "018,755, Bs = »B.= 306175, 
agreeing with the values found in Table I, p. 366. We can find the actual values 
of the p’s; they are: 


Pu=—'077,079, py = + 622,273, 
p= =+°696,036, r= 466,944. 
By equations (11) we obtain : 
efte= ‘126,758, o,= ‘85250, 


wits = 616,636, 8,= °99057, 
wh, = 2°043,625, cits = 38692. 


If we use equations (12) we have: 


Pa = +°035,992 instead of — :077,079, 
Pa =+ "664,366 39 + *622,273, 


Px» = + 673,602 »  +°696,036. 


Fence from (11) bis we deduce 


wHs = — °061,792 instead of °616,636, 
wt,= 1572278 e 2°043,625, 
whi= °0099 - ‘9906, 


= 296s “ 3°8692. 
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These results have been given to indicate that (12) cannot be trusted for very 
small samples. There is no approach to linearity. For n =2 the regression curve 
can be plotted from the table of the probability integral. It is a hyperbolic 
looking curve asymptoting to w=0 and uw=v. Neither homoscedasticity nor 
linearity of regression is true. The correlation however of w and v rapidly 
diminishes and r is very small for a sample of 20. Whether the assumptions 
which lead to (12) give better values for 8, and £, for the distribution of ranges 
between n=2 and n=20,I am not prepared to say. The hypothesis of zero 
correlation and the assumptions involved in (12) are shown for a series of values in 


Table IV, under the columns (a) and (6) respectively. Diagrams VI, VIT and VIII 








‘HEE SB iS 2 HT 
oe eeceee — 






























2 a mon _ 
Pee Ee eeneone a ee 
eee 
fe eee isa a Oa Hie ee 
eas eT ae ae eRe 
ie ees ae es 
a as a a ee 
eee er aes 
$2 eee Gane ma eeeeR ae 
nr A ee aaa a 
arr aes . ae e 


es aS ea 
Hees 















= 

He 

Ee 

Ea 

Es 

= 
ie alae Ss 
fe a 

i 

Ee 

He 

Es 

be 

Ea 













e ig 
ER ane 
Bac an Coen eee 1 
te See | aa 
Sane GG We SR Sa a a 
arama e Se c 
ee 
Ee ae oe 8 EE a ee 
SPP He NE SUE TSS EE 










Soae a 
EG EG Geeeeeo ee ee pate RE SE TE 

oe aE SU a a ie Cle RS ES 
fe a ee 
HES ER 2 GS AS le ee 
POPPE ee eee ia 












Diagram VII. 


illustrate the results of Table IV graphically, and from them the standard de- 
viation, and 8, and £,,can be read off for other values of n, The nature of the early 
part of the 8,, 8, curves cannot be predicted without special investigation. 


For the small samples (n= 10 and n= 20), the corrective terms are fairly 
considerable, and little stress is to be laid on the values given for 8, and f,, 
although it will be seen that they give fits to experimental data which are not 
unreasonable. For samples of 60 or more, the cross-product terms are small, and 
the values of 8 may be taken as correct for most practical purposes. It must be 

























376 Range between Extreme Individuals 


remembered that the third and fourth moments have large probable errors, which 
fact indicates that the form of distribution is not extremely sensitive to them, so 
that quite approximate values often give good fits. 


TABLE IV. 
Constants of Distribution of Ranges*. 





| 3 mq 
Size of Standard | = . apes Bind tee) r 
Sample n Deviation | (a) (d) | ) (b) 
2 "853 [991+] 010 [3° 869+] 2°977 “467 
10 °797 “O84 063 3°166 3°138 “078 
20 729 "125 ‘111 3°234 3°217 O37 
60 "639 "188 181 3°338 3°330 O12 
LOO "605 215 *210 3°382 3°377 “O07 
200 “D566 *247 "247 3°438 3°438 “00(0) 
500 524 "285 "285 3°502 3°502 *00(0) 
| 1000 497 309 *309 3°544 37544 | “O0(0) 
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Diagram VIII. 


Excepting for the Standard Deviation, little reliance can be placed on the last figure of the 
quantities given in this table. 
¢ Exact values. 
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It will be seen from Table IV that the curves to be fitted are Type VI, excepting 
for n=10 and n = 20, for which a Type IV is appropriate. One would expect the 
true curve to be one limited in the negative direction to zero range (since negative 
ranges are impossible) and extended to infinity in the positive direction. These 
curves are, however, very near to Type V, which does satisfy these conditions, 
while the §’s are obtained by approximative assumptions and so are not exact. 


The constants for samples of less than 10 have not been attempted, except in 
the case n= 2, which has been treated by Irwin (5) and is considered more in 
detail here. 


IV. DETERMINATION OF STANDARD DEVIATION BY THE MEAN RANGE. 


Since the mean range given in the tables is in terms of the standard deviation 
of the original population, in any given case, the latter can be found by taking 
samples, determining the mean range, and dividing it by the value given by the 
table. This method is rapid, but subject to larger sampling errors than the moment 
method. It is similar to one given by Pearson (6), in which the sample is ranked, 
and the difference between two certain individuals measured, and divided by the 
value for a population having unit standard deviation. He found that best results 
were obtained by taking individuals near the quindeciles (those n/15 from each end). 
The standard errors of the three methods are compared in Table V. Since the 
distributions are skew, this quantity has no precise meaning, but it will give a 
good appreciation of the accuracy of the method. 

The method of ranges gives better results when the samples are many and 
small than when they are few and large. Thus, ten samples of 10 give the standard 
deviation with a standard error of 8 ° 
standard error of 12°/.. 


, while one sample of 100 gives it with a 


It is very clear that the method of ranges is markedly inferior to the method 
of moments, and, sample for sample, to the method of ranking. It has the advantage 
however that it is easy to apply, even for large samples. 


TABLE V. 
Standard Errors of Standard Deviation of Normal Population 
(Comparison of Various Methods). 


Method of Method of Ranges 


Size of Method of | ayeas. taking 


| 
| Sample n Moments Quindeciles One sample Five samples | Ten samples | 
| of of n | of n | 
| 10 2236 2769 | +2590 1158 | “0819 
20 1581 1958 } 1952 | 0873 “0617 | 
60 0912 ‘1130 | °1377 “0616 “0436 
LOO ‘O707 ‘O876 | °1207 *0540 0382 
200 “0500 ‘O619 "1030 | ‘0461 “0326 | 
| 500 ‘0316 “O391 | “O862 “O386 0273 
| 1000 ‘0224 ‘0277 | ‘O767 | ‘0343 “0243 
| 
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V. EXPERIMENTAL VERIFICATION. 


It was considered worth while to make some sampling experiments in order to 
illustrate and confirm the results of this paper. 


A population of 1000 individuals was made by distributing them among 
27 categories according to the normal law, and with unit standard deviation. The 
width of each subrange was ‘2c. The individuals were very small cards, which were 
drawn one at a time from a bag, the card being replaced and the contents of the 
bag well mixed between each draw. In this way, 5000 draws were made, so that 
one had effectively a sample of 5000 from an infinite normal population. These 
were compounded to give 500 samples of 10, and 500 samples of 20. These experi- 
ments gave for the mean range 2°9452 + 0240 for n= 10 (theoretical value 3°07751), 
and 3°6034 +0220 for n = 20 (theoretical value 3°73495). The discrepancies between 
theory and practice are too great to be attributed to ordinary errors of random 
sampling, so it was concluded that the mixing between each draw had not been 
sufficient, and there was a tendency for neighbouring draws to be alike, leading to 
a low value of the mean range. The fact that later and more careful experiments 
removed the discrepancies makes this explanation plausible. The mean range thus 
supplies a fairly sensitive test of the randomness of a series of observations. 

For the second experiment, a population of 10,000 individuals was made in 
a similar way, but was divided into 69 classes, the breadth of each class being ‘le. 
The individuals were numbered from 0000 to 9999, those in the same class having 
consecutive numbers; and a key was constructed, so that given a number, one 
could find the class to which the individual bearing that number belonged. Then 
40,000 digits were taken at random from census reports, and combined by fours to 
give 10,000 numbers taken at random from 0000 to 9999 inclusive*. By means of 
the key, these numbers were converted into a random sample of the original 
population, which was then divided up to give 1000 samples of 5 (only half were 
taken), 1000 of 10 and 500 of 20. 

In order to find the range, it was assumed that the individuals were concentrated 
at the means of the groups. Table VI gives the distributions found, the experi- 
mental groups having been combined in pairs. The theoretical distributions with 
which they are compared were found by fitting Type IV curves, using the constants 
given in columns (b) of Table IV. In most cases there is good agreement between 
the theoretical and experimental values of the constants, the most serious dis- 
crepancies being in the case of the mean for n =5 (the difference is about 2°9 times 
the probable error) and the standard deviation for » = 10 (the difference is about 
22 times the probable error). The theoretical curves fit the data moderately well 
(n= 10: P='119,n=20: P ="455), indicating that even for the smaller samples, the 
values of 8, and 8, given in Table IV are not unreasonable, and are sufficient for 
practical purposes. In Diagrams IX and X, the theoretical curve and histogram of 


* By means of a suitable key, these numbers can be used to construct samples from any population. 
It is hoped that they will be published later, 
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the experimental data for the two cases have been superimposed. The equations 


to the curves are 


n=10: y= ye? cos 9, « = 63614 tan 0, log y, = 30°26265. 


Origin at — 70567 to left of mean; mode, 2°9803. 


n= 20: y= ype? cos 9, «2 =31734tan 0, log y = 9665328. 


Origin at — 74748 to left of mean; mode, 3°6173. 
TABLE VI. 
Distribution of Ranges. 




















n=10 n=20 
Range | as ee ¥ ac F ay 
Experimental | Theoretical | Experimental | Theoretical 
Frequency . Frequency Frequency Frequency 
“ ey | | 
W5— 25 —_ —_— — } — 
*O5— 4b <a | a Pa | = 
"4I— 65 —_ — _— 
65— 85 | ae = = | 
85—1°05 l | 
1°05— 1°25 l = — 
1-25—1°45 4 19 | 27°9 rhs | % 
1°*45—1°65 13} — 
1°65—1°85 20 | 24°6 — } — 
1°85—2°05 31 39°3 2 4 
2 25 62 56°0 19 | 13°7 
225245 | 75 | 717 3) | 
2: G5 75 86°71 9 14°4 
2°65—2°85 101 97°1 24 | 23°4 
2°85—3 05 114 101°1 35 34°3 
3°05 —3°25 92 | 98°5 40 44°3 
3°25—3°45 108 j 90°9 36 | 51°7 
3°45--3°65 88 | 78°7 70 | 55°5 
365—3B'°85 57 64:2 54 54°9 
3°85—h 05 | 50°6 51 | 509 
405 —425 40 38°7 | 44 | 3°6 
pe 25—Ip 45 17 | 266 34 34°7 
Ie hi—Iy 65 1] 18°0 | 25 2674 
yO5—I85 13 12°2 20 | 19-0 
4 85—F5°05 10 74 17 13°6 
5°05—5°25 7 8 | 8°6 
525545 3) | 6 | 
5AS—5'OS 1. 16 10-4 : | 
5O5—5'*85 | } 
5-86—6°O6 2) | i 14 11-0 
6°05—6°25 — | 
C2GIS | _ 
— | | - — —_ ——_— 
Totals | 1000 | 1000°0 500 500°0 
| 
Mean | 3 “1016+ 0170 | 3077 de 51 3 7508 + “0220 0 | 3° 73495 
c | “ar. 2 Us 797 ‘731 +°0155 | “729 
By 184 +°042 | ‘063 063 +064 | 11 
Bo | 3397 +°156 3°138 | 3026 +°248 3°217 
x° 25°248 | 14-949 
ig 1 


119 | “455 





| 
| 
4 





Experimental 
Frequency 


1000 


2°3798 + *0189 
fn +°0134 
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Table VII gives the correlation surface between the two extreme individuals 
in samples of 10. A symmetrical table has been formed by counting each extreme 
twice, as the first and as the last member. The constants given below the table 

TABLE VII. 
Extremes of Sample. n= 10. 


Central Values of First Variate (in terms of o). 














Central Values of Last Variate (in terms of o). 




















are the coefficient of correlation r, the correlation ratio y, and the usual constants 
of the distribution of the first variate. The corresponding theoretical values are 
given in brackets. In the case of 7, there is no significant discrepancy, and more- 
over » does not differ significantly from it, so that the non-linearity of the 
regression is not apparent in the sample. As for the constants of the first variate 
distribution (which are given by the marginal totals), there is good agreement for 
the mean, while for the standard deviation, and for 8, and B., the difference between 
the theoretical and experimental values is between two and three times the probable 
error, a difference which is scarcely significant, but still rather disappointingly 
large. 

In Table VIII, the results for samples of 20 are given. The correlation coefficient 
does not differ significantly from the theoretical value, nor from zero, while 7 
does not differ from 7%? (the square of the correlation ratio for zero association) by 
as much as the probable error. Thus, the sample gives no evidence of association 
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TABLE VIII. 


Correlation between Eatremes in Samples of 20. 











Last Extreme. 
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r=015+°'021, 
(037) 


y= "0084, n? = V070. 


Mean = 1°8758+:0076, = ="505+°005, B,="134+-047, 8.=3°245+4 177. 
(1:8675) (+525) (251) (3-469) 
between the extreme individuals, as indeed one would expect; the theoretical 
correlation is too small to show up in such a sample. The constants of the dis- 
tribution of the first variate are, on the whole, in accordance with the theoretical 
values *, 


A further experiment was made to see if the mean range, as found for a normal 
population, is very different from that of a skew population. The distribution 
chosen was 

10,000 _. . ; = ss or 
= 9.984 e~*z’, having o = 9'988, 8, = 5000, B, = 3°75. 


It was divided into 76 categories, and by means of a new key, was sampled by the 
numbers used previously. The mean range, as given by 460 samples of 10, was 
29°698 + ‘248, while the value for a normal distribution would be 30°738. The 
difference is about 4°2 times the probable error, and would not be significant in 
100 samples. This is a very skew distribution, so that for many practical purposes 
one may assume the mean range for the normal curve, even when the population 

* The samples of twenty were formed by combining those of ten in pairs, the first and second, the 
third and fourth, and so on. To give smoother regression, the other pairs (the second and third, the 
fourth and fifth, and so on) were also combined, so that 2000 samples of ten gave 2000 samples of 
twenty. These, however, are not independent, so that the probable errors (calculated for 2000) are 
under-estimated. Indeed, the addition of the second thousand made practically no difference to the dis- 


tribution in the marginal totals; thus it may be held, that the probable errors should have been 
calculated for a sample of near 1000 rather than of one of 2000. 
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sampled departs markedly from normality. It is not suggested, however, that 
serious discrepancies would not occur if the results were applied to curves of U or 
J type, or to curves having very limited range. 
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Diagram XI. 
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Diagram XII. 
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Range between Extreme Individuals 


TABLE IX. 





Probability Integral of Distribution of Largest Individual in Samples 
of Size n taken from Normal Population. 
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TABLE IX.—(continued). 


Probability Integral of Distribution of Largest Individual in Samples 
of Size n taken from Normal Population. 
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*88614 
“90282 
“91904 
“93480 
“95014 
‘96508 
‘97963 
“99382 
600766 
6°02117 
6°03437 


| 6°04726 
| 6-05986 


6:07218 


*868957| 3: 














L. H. C. Trepetrr 


TABLE X.—(continued). 











| Mean Range of Samples of Size n taken from Normal Population 
(given in terms of Standard Deviation). 
l | | | 7 | | 
n 8 Be a oe ee oe s o) n 
| 
ae ee ee = 
; 200 | 6°07340 | 6°07461 | 6°07583 | 6:0770% | 6°07824 | 6°07945~) 6°08065~| 6°08155~| 608804 | 6°08424 500 | 
| 510 | 608543 | 6-08662 | 6°08781 | 6-08899 | 6°09017 | 6°09135+) 6°09253 | 6°09370 | 609487 | 609604 | 510 | 
| 520 | 609721 | 6°09837 | 6:09953 | 6°10069 | 6°10185*| 6°10301 6°10416 | 6710531 | 6°10646 | 6°10760 | 520 | 
| 530 | 6°10874 | 6°10988 | 6°11102 | 6°11216 . 6-11329 | 6-11442 | 6°11555*| 6°11668 | 611780 | 611892 | 530 
| ee | 


540 | 6-12004 | 612116 | 6-12228 | 6-12339 | 6-12450 | 612561 | 6-12672 | 612782 | 612892 | 6-13002 | 540 
550 | 613112 | 6°13222 | 6-13331 | 6°13440 | 6°13549 | 6-13658 | 613766 | 6-13874 | 6-13982 | 6°14090 | 550 
560 | 6714198 | 6°14305+} 614413 | 614520 
570 | 6°15263 | 615368 | 6°15474 | 6-15579 

| 

| 


6°15683 | 6°15788 | 615892 | 6-15996 | 6-16101 | 616204 | 570 
580 | 6°16308 | 6-16411 eae 6°16618 


6°16720 | 6716823 | 6716926 6°17028 | 6717130 | 6°17232 580 
6°17738 | 6°17839 | 6-17940 | 618040 | 6°18140 | 6°18241 590 
6°18738 | 6-18837 | 6°18936 | 6°19034 | 6-19133 | 6-19231 600 
6719720 | 619817 6719914 | 6°20011 | 6°201038 | 6°20204 610 
6°20685—| 6°20780 | 6°20876 | 6°20971 | 6°21066 21161 620 
630 | 6°21255*| 6-21350 | 621444 | 6°21539 | 6-21633 | 6-21727 | 6-21820 | 621914 | 6-22007 | 6°22101 630 
640 | 6°22194 | 6°22287 | 6-22379 | 6°22472 | 6°22565- 6°22657 | 6°22749 | 6°22841 | 6°22933 | 6°230257| 640 
650 | 6°23116 | 6°23208 | 6-23299 | 6-23390 | 623481 6°23572 | 6°23662 | 6°23753 | 6°23843 | 6°23934 650 
660 | 6°24024 | 6-24113 | 6-24203 | 6°24293 | 6°24382 | 6-24472 | 6-24561 | 624650 | 624739 | 6°24827 660 
670 | 6°24916 | 6°25005-} 6-25093 | 625181 6°25269 | 625357 | 6°25445-) 6°25532 | 6°25620 6°25707 670 
680 | 6°25794 | 6°25881 | 6°25968 | 6°26055*| 6°26142 | 6-26228 | 6°26315— 6°26401 | 6°26487 | 6°26573 680 
690 | 6°26659 | 6°26744 | 6:26830 | 6269154 6:27001 | 627086 | 6°27171 | 6°27256 | 6-27340 | 6°27425*| 690 
‘ 700 | 6°27510 | 6°27594 | 6°27678 | 6°27762 | 6°27846 | 6-27930 | 6-28014 | 6°28097 | 6°28181 6°28264 700 
710 | 6°28347 | 6°28430 6°28513 | 6°28596 | 6°28679 | 6°28762 | 6°28844 6°28926 | 629009 | 6°29091 710 
720 | 6°29173 | §-30088 6°29336 | 6°29418 | 6-29499 | 6-29581 | 6°29662 6°29743 | 6°29824 | 6°299057| 720 

+ 6 

| 6 

6 

6 

6 

6 

6 

6 

6 


590 | 6°17333 | 6°17435*| 6°17536 | 6°17638 
600 | 6°18340 | 6718440 | 6-18540 | 6°18639 
610 | 619329 | 6-19427 | 6-19525-| 619623 
620 | 6°20301 | 6°20397 | 6-20493 | 6°20589 





f*) 











a o> 


730 | 6°29985+| 630066 | 6°30147 | 6°30227 330307 | 630387 | 6°30467 | 6°30547 | 6°30627 | 6-30707 | 730 
740 | 6°30786 | 630866 | 6-30945*| 631024 | 6°31103 | 6-31182 | 631261 | 631340 | 631419 | 631497 | 740 
750 | 631576 | 6°31654 | 6-31732 | 631810 | 
760 | 6°32353 | 6°32431 | 632508 | 6°32585- 
770 | 6°33120 | 6°33197 | 6°33273 | 6°33348 
780 | 6°33877 | 6°33952 | 6:34027 | 6°34102 


sB1888 | 631966 6°32044 | 6°32121 | 6°32199 | 6°32276 | 750 
332661 | 6°32738 | 6°B2815-| 6-32891 | 632968 | 6°33044 760 
3°33424 | 633500 | 6°33575+| 6°33651 | 6°33726 | 6°33802 770 
34176 | 63425! | 6°34325+) 6°34400 | 6°34474 6°34548 | 780 
790 | 6°34623 | 6°34697 | 6°34771 | 6°34844 | 6°34918 | 6°34992 | 6°35065+) 6°35139 | 6°35212 6°35285"| 790 
800 | 6°35358 | 6°35431 | 6-35504 | 6°B5577 | 6°35650 | 6°35722 | 6°35795-| 6-35867 | 6-35940 | 6-36012 | S800 
810 | 6°36084 | 6°36156 | 6°36228 | 6°36300 | 6°36372 | 6°36433 | 6°36515+ 6°36587 | 6°36658 | 6°36729 810 
820 | 6°36800 | 6°36872 | 6°36942 | 6°37013 | 637084 6°37155-| 6°37226 | 6°37296 | 6°37367 | 6°37437 | 820 
830 | 6°37507 | 6°37577 | 6°37647 | 6°37717 | 6°37787 | 637857 | 6°37927 | 6°37997 | 6°38066 | 6°38136 830 
840 | 6°38205-| 6°38274 | 6-38343 | 6°38412 | 6°38482 | 6-38550 | 6°38619 | 6°38688 | 638757 | 638825" 840 
850 | 6°38894 | 6°38962 | 6-39031 | 6°39099 | 6°39167 6°39235-| 6°39303 | 6°39371 | 6°39438 | 6°39506 a) 
860 | 6°39574 | 6°39641 | 6°39709 | 6°39776 | 6°39843 | 6°39911 | 6°39978 6°40045~| 6°40112 | 6°40179 | 860 
870 | 6°40245+| 6°40312 | 6°40379 | 6°40445+| 6°40512 | 640578 | 6°40644 | 6-40711 | 6°40777 6°40843 870 
880 | 6°40909 | 6°409757| 6-41040 | 6°41106 | 6°41172 | 641237 | 641303 | 6-41368 6°41433 | 6°41499 880 
890 | 6°41564 | 6°41629 | 6°41694 | 641759 | 6°41824 | 6-41889 | 6°41953 | 642018 | 6°42082 6°42147 890 
900 | 6°42211 | 6°42276 | 6:42340 | 6-42404 | 6-42468 | 6:-42532 | 6°42596 | 6-42660 | 6°42723 | 6°42787 900 
| 910 | 642851 | 6-42914 | 6-42978 | 6-43041 | 6-43104 | 6-43168 | 643231 | 6-43294 | 6-43357 | 6-43420 | 910 
« 920 | 6°43483 | 6°43546 | 6-43608 | 6°43671 | 6°43734 | 6°43796 | 6°43858 | 6-43921 | 6743983 | 644045") 920 

930 | 6°44108 | 6°44170 | 6°44232 | 6°44294 | 6°44355+| 6:44417 | 644479 | 644540 | 6°44602 | 6-44664 930 
940 | 6°44725+| 644786 | 6°44848 | 6-44909 | 6-44970 | 6:45031 | 6°45092 | 6°45153 | 6-45214 | 6-452757 940 
950 | 6°45335+] 6°45396 | 6°45457 | 6°45517 | 6°45578 | 6456388 | 6°45698 | 6°45759 | 645819 6°45879 950 
960 | 645939 | 6°45999 | 6°46059 | 6:46119 | 6°46178 | 6°46238 | 646298 | 646357 | 646417 | 646476 960 
| 970 | 6°46536 | 6°465957| 6°46654 | 6°46714 | 6°46773 | 6°46832 | 6-46891 | 6-46950 | 6-47008 | 6:47067 970 
980 | 6°47126 | 6°47185-| 6°47243 | 6°47302 | 6°47360 | 6°47419 | 6°47477 | 6°47535+| 6°47594 | 6°47652 | 980 
t 990 | 6°47710 | 6°47768 | 6°47826 | 6°47884 | 6°47942 | 6°47999 | 6°48057 | 6°-48115~| 6°48172 | 6°48230 | 990 

1000 | 6°48287 ae — oS — a —_ 1000 
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6°14626 | 614733 | 6°14839 | 6°14946 | 615052 | 6°15157 | 560 | 














BAYES’ THEOREM, EXAMINED IN THE LIGHT 
OF EXPERIMENTAL SAMPLING. 


By EGON S. PEARSON, M.A. 


1. IN reading from the long succession of papers, and chapters in books on the 
theory of probability that have been written on the problem of “inverse prob- 
abilities” during the last century and a half, it is difficult not to be struck by the 
absence of any systematic attempt to put the theoretical rules to the test. Both 
the supporters and detractors of what has been termed Bayes’ Theorem have relied 
almost entirely on the logic of their argument; this has been so from the time 
when Price*, communicating Bayes’ notes to the Royal Society, first dwelt on the 
definite rule by which a man fresh to this world ought to regulate his expectation 
of succeeding sunrises, up to recent days when Keynest+ has argued that it is 
almost discreditable to base any reliance on so foolish a theorem. Such appeals to 
experience as have been made are of little final value; Vennt it is true realised 
that the rule could not be tested by single instances but only in the long run of 
experience, yet after giving three unfavourable cases he was content to leave 
matters with the assurance that the experimenter “could hardly fail” to find 
himself “ grossly wrong” if he based his expectation on the rule in further trials. 
The main reason why the critics have failed to produce any solid evidence in 
favour of their contentions, is perhaps because they have felt that the “equal 
distribution of ignorance” and the “distribution of a priori probabilities” are 
conceptions too intangible and unreal to be brought to the test of numbers, but 
the supporters of the theory can have no excuse for leaving matters as they stand 
except the plea of the time and labour involved in collecting data. If Bayes’ 
Theorem is of practical value to the statistician it must be possible to show by 
a series of ad hoc experiments that the predictions which it makes are in fact 
justified. My excuse therefore in returning to this much and long discussed 
subject is to describe the results of a long series of experimental samplings which 
have been carried out at intervals during the last four years with a view to making 
more clear the use and the limitations of Bayes’ Theorem in the field of practical 
statistics. But before dealing with the experiments it will be helpful to consider 
certain aspects of the problem from the theoretical point of view. 

According to Price§, Bayes originally reached his result by a “ very ingenious 
solution” based on the supposition that “the chance was the same that the prob- 
* Phil. Trans. Vol. ti. 1763, p. 370 et seq. 

+ A Treatise on Probability, 1921, p. 382. 


t The Logic of Chance, 1876, p. 180. 
§ loc. cit. 
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ability for the happening of an event perfectly unknown should lie between any 
two equidistant degrees.” But afterwards Bayes “considered that the postulate 
on which he had argued might not perhaps be looked upon by all as reasonable ; 
and therefore he chose to lay down in another form the proposition in which he 
thought the solution of the problem is contained.” That is to say after first looking 
at the problem from the point of view of the distribution of chances, he developed 
the proof which made use of the idea of balls dropped on a table. A first ball is 
dropped which is equally likely to rest anywhere on the table, and this is followed 
by the dropping of n other balls also equally likely to rest anywhere, of which p 
fall on one side of the first ball and q on the other. Contained in this proof is 
the idea of an event depending on the value of some underlying variate ; happening 
when the variate is above a limiting value, failing to happen when it is below this 
value. This method of approach was passed over by the later writers, who with 
Laplace started simply from the idea of an equal distribution of chances, and until 
recently the controversy that has raged has been concerned almost entirely with 
what Venn termed the Rule of Succession*. This rule concerns itself with the 
chance of an event reappearing in a single trial only; it is usually given as follows. 
If an event has been observed to occur p times out of n trials under certain 
conditions, then the probability of its recurrence under the same conditions is 
(p +1)/(n +2). The rule was sometimes given with p = 0, and applied occasionally 
solely to the prediction of events which had never been known to occur before ; 
it is partly this one-sided application of the theorem which has been cited as 
bringing discredit on the whole theory of inverse probabilities. 


Bayes’ conception that the happening of an event depends on the value of an 
underlying variate has recently been extended by K. Pearson in this Journal+. The 
assumptions involved in his method of approach are put out clearly in the last of 
these papers but the position may be summarised as follows. A certain character 
observed among the individuals of a population can be considered as dependent 
on some underlying variate, 7, such that the character appears when «# exceeds a 
limiting value, & but fails to appear when « is less than £&. We are then concerned 
with two functions : 

J (a), the frequency curve for values of # in the population, 
¢ (&), the frequency curve giving the a priori possible values of &. 

To reach the final result it is necessary to assume that the functions f and ¢ 
are identical. This would be the case naturally in a simple extension of Bayes’ 
idea of balls on a billiard table, for here the factors determining the position of 
the first ball dropped would be the same as those determining that of each of the 
subsequent balls}, but in general this correspondence does not exist and it seems 
to me often difficult to feel confident that it is justifiable to link f with ¢. 


* loc. cit. p. 176. 

} Biometrika, Vol. xu. pp. 1—16; 300, 301. Vol. xvr. pp. 190—193. 

} That is to say, if we may suppose that the presence of previously dropped balls does not affect the 
rolling of those that follow. 
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A concrete example will perhaps illustrate the difficulty. Suppose we are con- 
sidering a population of human beings in which the character under consideration 
is “a temperature over 100°F.” This temperature of the body depends in some 
way on its functioning, on some factor of metabolism ; call this factor # and suppose 
that its distribution among the individuals is represented by f(x). At a certain 
value & of the metabolic factor, the temperature in the individual rises to 100° F. ; 
we do not know this value & nor the distribution of possible values but we suppose 
it to be (&). Then in order to reach Bayes’ Theorem we assume that j'(#) = c.o(«). 
Can we feel confident that this is a reasonable assumption? The frequency dis- 
tribution of « is something perfectly definite, if uiknown to us, depending entirely 
on the factors influencing temperature. But the value of & depends on the arbitrary 
temperature limit that we have fixed; we might have chosen 99° F. or 150° F. and 
should have had the same justification in applying Bayes’ Theorem in each case. 
Can we then relate ¢ to f, and are we really justified in assuming as a general 
rule that the “most likely” value of & is the modal value of # (which follows if 
f=c.)? We cannot appeal directly to experience, for no method of practical 
sampling would I think enable us to determine the form and relationship of f 
and ¢. But this should be noted; if F(P) is a function representing the distribu- 
tion of probabilities or proportions in the world of experience, it has been shown* 
that the relation f=c.@ necessarily implies /’= constant. Now this distribution 
F(P) is of a more tangible nature and can be made the subject for experimental 
inquiry. | prefer therefore to approach the problem from a rather different point 
of view, an approach which appears to lead up more naturally to the series of 
experimental samplings to be discussed, than one which involves the idea of the f 
and @ functions. 


2. Suppose a man to be surrounded with a number, L, of bags each containing 

a large number of black and white balls. In a particular bag the proportion of 
black to white balls is P/Q, where P+Q=1; that is to say with each bag is 
associated a proportion P, and of the J bags there are L. F(P) with proportion P. 
For the moment I do not assume F'(P) to be a continuous function. The man 
now draws first » balls and then m out of a bag of unknown constitution (if the 
number of balls in each bag were not very large he would need to replace the ball 
after each draw); he repeats this process V times, always choosing a bag at random. 
If N is made very large, then in the limit V.#'(P) of these draws will come from 
“ P-bags,” and again in the limit he will expect to find that in 

n!} a 

—, Pe (1 — P)t NF (P) 

pig: 

of these cases the first sample contains p black and q white balls. The total number 


of the NV draws which will give p black and q white balls will therefore tend to 


Sp tL Pe (1 - Py NF (P)} = M,, 


Pd 


* Biometrika, Vol. xvi. p. 192. 
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where Sp means the summation for all existing constitutions of the Z bags. Hence 
of the M, draws which gave p and q the proportion which came from “ P-bags” 
will tend to 
o,- _Pr- PYF) 
pep Sp{P?(1— Py F(P) 

Now consider the second sample of m balls. We have : 

Number of draws which gave p and gq in first sample > M,,. 

Number of these which came from “ P-bags” > M,. pC). 


Number of these last draws which will give 7 black and s white in the second 
m! 
sample > ——, P* (1 — Py M, ?C,. 
eet} 


Hence confining our attention to the M, draws, the number which will give 
r and s in the second sample will tend to 
, (m! . ) 
Sp lrts! Pral— PY My Cy) =... 


? 


and the “chance” of getting r and s after p and gq, or the value to which the 
proportion of times that this will happen will tend as N is increased, is seen to be 
Myy _-m! Sp {Prt (1 — Py F (PY (ia) 
M, ris! Sp { Pp a- Py F(P) cece eee ecececeees . 
Put in this way the problem is as definite as that of drawing from a single bag, 
where the distribution of the results of successive draws is given by the terms of 
the binomial. Instead of drawing from a single bag we are now drawing, as it 
were, from a bag of bags. 





Consider the expression 
Sp {Pe*" (1 — Py F(P)}, 
and write z= perr(1— Py, y=F(P). 

Suppose that the number of bags with which the experimenter is surrounded 
be very large, and that the values of P are distributed at vanishingly small intervals 
of h between 0 and 1, so that there are y, for which P=0, y, for which P=h,... y 
for which P=1 where hi=1. Then if F(P) is a continuous function with ordi- 
nates Y, Y: -.. yy at intervals of h, we have from the Euler-Maclaurin Theorem 


1 rl 
h S (zy) =| zydP + th (z.yo + 2m) 
Pp=0 0 


= (4P) - CP) to) -(EQP)}—~ 


where the subscripts 0 and / indicate the values of the functions or their deriva 
tives at P=0 and P=1=Th. 


Hence the relation (i @) may be written 


, [Peer = Py PP) aP 
me a Se ass 





Ci. = 


| ' pr (1 — Pyt F(P) dP 
0 


ris! 
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provided that F'(P) is a continuous function and that in the limit as h > 0, the 
expressions 
ley jy E ey) 
dP dP 
vanish both when P=0 and P =1, for all values of p+7 from 0 to n+m. 


hzy, 





C,,, is then the chance of getting 7 and s in a sample of m after finding p 
and q in a sample of n, under the conditions of sampling outlined above. 


We have now to consider whether this analogy of the bags of balls can be 
made to correspond in any way with the problems of practical statistics. What is 
a bag of balls, and what the frequency distribution F(P)? Iam here making no 
use of Bayes’ idea of an underlying variate # with a limiting value £; a bag of 
balls represents a population that we are sampling, the individuals in which are 
divided into two categories, those that have and those that have not the character 
under consideration. The proportions are as P to Q. In speaking of random 
sampling we mean, I think, that there is no correlation between the order of our 
choice (whether in space or time), and the underlying factors which determine the 
character or event in question. It has been argued that outside the realm of bags 
of balls, few or no populations exist of sufficient stability to justify the application 
of Bernoulli's Theorem. Yet it seems now hardly necessary to provide evidence to 
prove that the stability of statistical ratios is a matter of practical reality, so that 
cases occur again and again in which the trained statistician can feel reasonably 
confident that he is taking random samples from a single population. The experi- 
ments which follow will however provide some further evidence on this point. 


It is when we come to consider the distribution #’(P) that we reach the crucial 
point of the whole problem. Can we find any clear correspondence between the 
frequency distribution of “ P-bags ” and the distributions and chances of experience ? 
In reaching equation (i) t have dealt as far as possible in terms of frequencies rather 
than probabilities, and it will be clearest to continue in this manner. Just as in 
dealing with the binomial the “chance” of a single draw can only really be inter- 
preted by reference to the results of a series of drawings, so here the expression C,,, 
is only intelligible in connection with a series of trials. These trials are not now 
samples from one “bag” or one population, but samples from many “bags,” samples 
from the varied populations that may be met with in statistical experience. To 
reach the simple Bayes’ Equation, we must suppose F’(P) to be constant—the 
so that* 





frequency distribution to be a rectangle 
‘| 

| Petr @! a P)s** dP 
m! Jo 


Cyr= al — 
| Pe(1—PydP 
0 


_ m! B(p+r+lq+s+l) 
rte! B(p+1,q+ 1) 


Pr? 8! 


* Using the previous notation, if  (P)=c, then zy=cP?*t* (1~ P)?**, and the differentials vanish 


; 1 (z 
after the n+mth, so that there are only a finite number of terms hzy, h? : es , ... Whose value can 


be made as small as we please at P=0 and P=1 by making h0. Hence (ia) may be written as (i). 
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If this expression is to be of value to the statistician, he must feel confident 
that among the “bags” with which he is surrounded or the populations with which 
he is dealing, all values of P from 0 to 1 are approximately equally represented. 
Or it must be shown that some better and more general value of F (P) should be 
chosen, or alternatively that variations in F(P) hardly affect the value of C,,,. This 
is the great stumbling-block of Bayes’ Theorem; we may be better able to sur- 
mount it after considering the results of the extensive sampling to be discussed 
below, but first there are a few further points of theory to be considered. 


3. The Hypergeometrical Series. The values of C,,,, from (ii), for values of 7, 
0, 1, 2,..., m, are the successive terms of the hypergeometrical series 
mp+l ,m(m—1) (p+1)(pt+2) 
ofre TPE + 2! (q+m)(q+m-—1) 
_m (m —1)(m — 2) (p+1)(p+2)(p+3)_ 


3! (q+m)(q+m—1)(q+m—2) 


CT (n+ 2) (q+m+1) 
C(in+m+2)0P(q4+1)° 








+i, 





where C,= 


This series may be represented by a histogram whose momental constants are* 


Mean = mips ) measured from the centre of first block 
P+ D@t Dea @snt?) 
a (n+ 2)? (n+8) 
— py (n+2m+2P - 
g,=4 py (n+ 2m + 2) L (iv), 


My (n + 2)? (n + 4) | 
i 1 2) 46 2 
wa” +45) \ +1)(n+2)+ 6m (n+ m+ 2) 


: et eet 1) [(v + 2) (m — 2) + m2 (n + 2) — Gm (n + m+ 2} 
n =) J 





Subject to certain limitations+, the histogram of the hypergeometrical series 
(iii) can therefore be represented by a Pearson curve having the moments of (iv). 


Now let us consider the effect of introducing a function ¥(P) into (ii) as in (i). 
The curve chosen must lie entirely between P =0 and P=1, and we will suppose 
it to be symmetrical about P= }, so that the number of “bags” with a proportion 


* See Phil. Mag. 1907, Vol. x1. p. 370, and also Biometrika, Vol. xv1. pp. 157—162, where the 
moments of the hypergeometrical series are given with somewhat different notation. In Biometrika, 
Vol. xu. pp. 9—14, K. Pearson obtains the equation of a Type I curve corresponding to the hyper- 
geometrical series by putting an approximate ratio of slope to ordinate in the general differential 
equation, but in general slightly better results are obtained by fitting from moments. 

+ These moments have not been corrected for grouping or abruptness, and therefore the corre- 
spondence between histogram and curve will not be satisfactory if m is very small or if there is marked 
abruptness at one or both of the tails. 


Biometrika xv 26 
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of P black balls is equal to that with a proportion of P white balls. The function 


LP (@+))) 
I= Tat? 


will provide us with a considerable range of unimodal curves. We know that 


1 
| ydP=1, 
0 


nt Te Nee (v) 


and further that 


(a) for —1<a<0,(v) represents a series of U-curves with infinite ordinates at 
P=0 and 1, but a finite area underneath ; 

(b) for 0<a<+1, the curves are inverted U’s with tangents parallel to the 
y-axis at P=0 and 1; 

(c) for + 1 <a, the curves are of the “cocked hat” type touching the P-axis at 
0 and 1. 


The general form of these three types of curves is shown in Fig. 1. If the 
curve of (v) is to be extended to represent the distribution of “chances” among 
the populations of general experience, it is almost certain that one of the forms 
(a) or (b) will be the most appropriate. For (c) implies that the frequency of 
“chances” between 0 and some small fraction 6P is vanishingly small, which 
certainly appears contrary to general experience. 





























. _ . a s i, 


Tig. 1. Different forms of Type I Curves. 











If F(P) is represented by (v), or y=cP*(1—P)*, is it permissible to pass 
from (i a) to (i) by the Euler-Maclaurin bridge? In the first place, if -1<a<0, 
(v) can only be appropriate if there are no populations with P =0 exactly, for the 
proof has been developed supposing NF (P) to be always finite. We must suppose 
that the values of P are again distributed at intervals of h, but that there are 
now y; cases with P = th, y; with P=8h, ... y,4, with P=1 —$h, where h can be ; 
made as small as we please but never quite vanishes. Then the Euler-Maclaurin | 
Theorem gives 


1—th 


1-4h 1—3h 1—sh 
hS (y= | zydP +3h[zy] —7h Era aac | 
hh hh 


P=hh 


_dP 


h 
bh 
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The most unfavourable case for the vanishing of the limit terms will be when 
p+r=0, at the limit P=3h. Then 
zy =cP*(1— Pyrrmte, 
d (zy) 
dP 

Putting P= th, we have 

Limit (h > 0) hzy = (4) ch'*, 

ai , d (zy) a ae 
Limit (h > 0) h? ap 7° fa (})2 hit4 — (n+ m +a) (4) et), 


But as —1< a< 0, these expressions and also those for the higher differentials 


=¢ {aP(1— Pymre— (n +m +a) PX(1— Pym} ete 


can be made as small as we please by reducing h. 


1 \ 
Now [ zyaP =| P«(1— Pyrtmte dP, which is a B-Function with a finite 
u 0 


value; also zy is a continuous function in the range P=%3h to 1—1h. Hence as 
h is decreased, 


1-hh 1 1-ih 1 
| zyaP > | zydP andh S (zy)> [ zydP. 
th 0 P=kh J0 


It follows that, in general, 
1—}h 1 
h S SePretrte (1 — Pitta} >| ePrtrta(] — Pyitst2 dP as h—>0, 
hh 0 
for all values of p+r between 0 and n+m, and h being a common factor in 
numerator and denominator, we obtain from (i «), 


rl 

p+r+a 1 — P)atsta P 

C ml” ( ) - 2 m! B(p+r+atlqtst+atl) (vi) 

ie [| Pet (1 — Pye aP rist B(ptatl.gtatl) “" ” 
0 


In the distribution of populations, cP* (1 — P)*8P represents the frequency of 
eases in which P lies between P and P+ 6P, but there are no populations in 
which P is exactly zero or unity. 

From (vi) we can reach the hypergeometrical series corresponding to (iii) and 
the moments corresponding to (iv) by substituting 

ptaforp, q+taforg, n+ 2a for n. 

Now if p and q are both large compared with a—that is compared with unity— 
the introduction of #'(P) will not seriously modify the form of curve given by the 
moments of (iv). For example, consider the change in the first two momental con- 
stants. 

The Mean 

,_ m(pt+a+l1)_m(p+l1)/ a ( =) 
i nt+2a+2 ~ (n +2) essa n+2 
a 2a ) 


bac wars n+)’ 


+] ore negligible. 


” : a 
if squares and higher powers of 
p 
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The Second Moment. By a similar process, it is found that 


- (1+. i 2a 4a -*,) 
sighed ptl qtl n+tm+2 nt+2 n+3/)’ 





; a 
if the powers of — and 


me : ot are neglected. 


If n, the size of the first sample, is of the order of 100 or so, and neither p nor q 
is less, let us suppose, than 20, it is clear that the modifications in v, and py will 
be very small. In fact they are only serious when p (or q) approaches zero, and 
the relations above give us a first order approximation 


‘aw’ (1+ a \ mt (1+—*,) 
ww! =H ( p+i)’ abe = fe pti : 


It will be better to turn to the actual sampling before considering further what 
this result implies, and whether in fact the F'(P) distribution can be represented 
by a Type I curve. 


4. The Experimental Sampling. The scheme of the experiment was as follows: 
first a certain proposition was fixed upon, e.g. taxicabs in London streets whose 
registration letter is LX. Then a sample of n taxis was observed in which it was 
noted that the proposition was true p times and untrue q times; then a second 
sample of m was observed in which the numbers were now r and s. After fixing 
upon a large number of different propositions and observing in each case the 
constitution of a double sample (first of n and then of m) the distribution of 
observed values of r could be compared with the theoretical distributions which 
on Bayes’ Theorem the knowledge of x and p should enable us to predict. Two 
Series of samples were collected; in the first a very large number were obtained 
with a fixed size of first and second samples, viz. n = 20,m=15,; the second consists 
of comparatively few samples, but with values of » and m which vary from 15 
to 600. 


Series I. While the size of the first and second samples might be fixed at will, 
it was impossible to choose beforehand the size of p; that is to say, it was necessary 
to consider the distribution of 7 in second samples for all values of p from 0 to n, 


- : ; 
or rather from 0 to 5, for we can class together cases in which the counts for the 


first sample give p and n- p “successes*.” It followed that if the number of 
observations for each of these values of p was to be large enough to provide a 
satisfactory comparison of theory with observation, the value of n selected must 
not be too large, or the labour involved in collecting the material would have 


* Whether we suppose we have found p out of n or n—p out of n in the first sample, the frequency 
curve for the distribution of + in the second sample of m will be identical but reversed ; e.g. the chance 
of finding 6 out of 15 after finding 8 out of 20, is the same as that of obtaining 9 out of 15 after 
12 out of 20. Hence we may always take p as the size of the smaller class into which the sample of n 


is divided, and in second samples have only to deal (on the Bayes’ hypothesis) with curves having 
positive skewness, 
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become prohibitive. The values chosen were n= 20, m=15, which provided 
11 distributions for 7 in the second samples corresponding to p=0, 1, 2... 10. 
12,448 double samples of 20 and 15 were collected, representing 435,680 single 
observations, and giving an average frequency of the order of 1000 values of r for 
each of the 11 values of p. The samples were taken from a very wide field, partly 
because this made it easier to obtain them in large numbers, but also because 
evidence as to the stability of statistical ratios obtained from material of very 
great variety would be of more value than that obtained from some limited field 
of inquiry. The following outline will give an idea of the main sources of the 
data : 
(1) Casual observations in London streets and elsewhere. For example: 


(a) I walk down the Euston Road and count how many men of the first 20 
I meet are smoking a pipe (=p), and then how many out of a succeeding 15 (= 7), 

(b) From a window in Gower Street I observe how many vehicles out of the 
first 20 that pass below are drawn by horses, and then how many out of a later 
sample of 15. 

In taking these samples, the second sample was not necessarily observed 
immediately after the first, but of course it was necessary to be careful not to 
allow any obvious change in population constitution to occur in the interval. For 
instance the proportion of taxis to all motors is likely to be considerably higher 

| in the Strand than in the Old Kent Road, and one would not take a sample of n 
from the first and m from the second district. But in all practical sampling this 
element of uncertainty arises; risks must sometimes be taken, and it was part o1 
the present inquiry to find out how far in the long run the theoretical predictions 
were modified by such changes in the sampled population that a reasonable degree 
of caution cannot prevent sometimes slipping through. 


(2) Hackney Stud Books, Guernsey Herd Books, Greyhound Stud Books. To 
ensure randomness in sampling, use was made of the alphabetical arrangements 
of the animals according to name; p might here be in one case the number of 
chestnut colts born of bay mares in a first sample of 20, and r the same thing in 
a sample of 15. Or the proposition might be hounds with fawn in their coats, 


(3) Letters and words in books; a great range and variety. For example: 

(a) pis the number of times in which the second word in the top line of the 
first 20 pages of Carlyle’s “ Latter Day Pamphlets” is a verb; r the same for 
the next 15 pages. 

(b) p is the number of times in which the last word on each of 20 successive 
pages of the “ Faerie Queene” contains the letter “e” 
other pages. 


; ris the same thing for 15 


In these samples we are testing the frequency of a certain character in the 
English language, or it may be the language as modified in the style of different 
authors. 
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(4) The Census and the Registrar General’s Annual Reports for England and 
Wales. Here also there were a great variety of possible methods of sampling from 
the different tables. For example : 

(a) Registrar General's Annual Report, 1920, pp. 126—139, The number of 
cases in which female deaths exceeded male deaths in samples of towns or districts 
chosen randomly by help of the alphabetical arrangement of names. 28 double 
samples could be obtained from each group of towns, one pair for each of the 28 
age-at-death groups. 

(b) Ibid. pp. 36—39. The number of instances in which deaths from the 
specified causes were above or below certain arbitrarily chosen limits, the samples 
of towns again usually chosen by alphabetical order. 





In selecting the towns in this way it is true that we are not sampling from 
anything approaching an infinite population, and if town A has appeared in the 
first sample it does not appear in the second. This is frequently the case in 
practical sampling, and I do not think it vitiates the result ; we may perhaps look 
on the two samples as drawn from the hypothetical population of an immensely 
greater Britain subjected to the same general conditions. We are only using the 
same material for a single double sample, and do not fall into the error which 
arises when repeated samples are taken without proper replacement from a limited 
population whose known constitution is made use of in calculating the theoretical 
frequencies to be expected in random sampling. 


(5) Information regarding Births from “The Times” of 1895*, Here the 
characters observed were the proportion of male to female births, the frequency 
of twins, the number of days’ interval between birth and announcement in the 
paper, ete. 

The results of the 12,448 double samples classified according to the values of 
p and r are shown in Table I+. It will be well first to obtain from these some idea 
of the form of distribution of proportions or chances, P?, among the populations 
that have been sampled. 


Suppose that this distribution is represented by F(P). Then on the assump- 
tion that the two samples have been drawn at random from the same population, 
the ratio (p+ r)/(n+ m) or 4; (p +r) will certainly give a likely value for P. How 
far will the distribution of this ratio correspond to the distribution of F(P)? In 
drawing a large number, NV, of double samples we shall take a number which 
approaches in the limit V.F(P) from “ P-bags” or populations containing a 
proportion P of marked individuals. Of these in the limit, 

NF(P) n+m . Ps (1 aa Pye 
( a 


' 


Din+m—e@! 
Data collected for another purpose by Mr P, F. Everitt. 

+ The distribution of r for p=10is made symmetrical, since it depended too much on the whim of the 

observer whether an observation p=10, r=6 (say) was recorded as (10, 6) or (10, 9), for any deductions 

to be drawn from the symmetry or asymmetry of the recorded distribution. 
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TABLE LI. 
Observed Distribution of p and r among the 12,448 double samples. 


Values of p. 




















0 1 | 2 3 4 | 5 6 7 $ | 9 10 

o | 780| 453; gs9/ 135! o2| 47| a7| un rT et. 4 

1 315 | 406 343] 250! 173; 115 68 | 34 i2| 10 | 3 

2 | 126 255, 311] 289] 231] 183] 143] 77 0| 27 | 13 

3 46 | 153 202; 220| 241] 193] 176| 115! 88] 55 | 30 

4 | 16 47 128/| 163!| 194; 194] 174! 150! 164] 102 | 7 

e + & 27 50| 103) 118] 196 | 219) 193 185 | 130 93 
» 6 l 11 22 48 92 124) M0) 178) 173 | 135 | 120 
- 7 _ 1 5 11} 49| 78/] 110] 135! 148| 153 | 172 
a a4 — 4 8 31 47} 62| 95) 100) 131 | [172] 
a 9 | — 3 1; 12; 2 35) 37) 76) (90 {io 
i 10 | —_ l 1 | 3 14 17 | 21 60 63 | [93] 
SS a ee oe 4 3; a; 14! 27] 36] Iv 

Ee Bae Boe, Pras br l 1| 2 3 8| 16 | [30 

13 | a oe ee ee 1 | fis) 

Lh i — | — | —j|j}— iM ei 

15 | — — = —— | 
Totals| 1295 | 1356 | 1359 | 1229 | 1211 1215 | 1174 | 1063 | 1087 | 955. | [1008] 

















will contain @ marked individuals. Hence it follows that in the N double samples 
the number in which p +7 =a will tend to 
La =Sp Ny F(P) eS Ped =o Pyetm al 
(a)! (1+m—a)! s 
(i (n+m)! > 
= RFP) P*( — Pye “dP ...... 64 
ee (*) \ ) ' (vit), 
if F(P) may be taken as continuous and the conditions of p. 392 are satisfied. 
Consider different forms of /’(?). 
(a) Suppose all values of the proportions among the populations sampled are 
equally common, or (2) = 1. 
ie (n+ m)! 


Then L,= Rey aereoparn + Ped — Pye dP 
_ a (n+my)! C(at+1)P(m+m—a4+l) 
' (a)!(n+m—a)! (n+ m+ 2) 
a N 
n+m+1- 
That is to say in V samples the observed values of p +r will tend to be distributed 


uniformly among the 36 values, 0, 1, 2... 35. 


C(2(a+ 1) pa — 


PP ileal ') bis. 
Tat 1)? i. a (v) bis 


(b) Suppose F(P)= 
| Then 
L, = N 


(n+ m)! (2(a+i)) F(at+a41)P(atn+m—a+1) 
(a)! (n+m—a)! {P(a+1)}? C(n+m + 2a+ 2) 


..Aviii). 
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The frequency distribution of (p+r)/(m+n) or [Iq is therefore represented by 
the terms of a hypergeometrical series which may be written 


0 (1 n+m atl _4 etm) (n+m—1) (a+1)(a+2) rm \ 
E 1! n+m+a 2! (n+m+a)(n+m+a-—1) ee 
bipel (ix). 


Now K. Pearson has shown*, from a consideration of the ratio of slope to ordi- 
nate, that the hypergeometrical series (iii) of p. 393 above can be represented 


by a Type I curve 
xa\* a\* , 
Y= Yo (1 ad ) (1 + :) errr rere eee ree eee eee ee ee (x), 


where after a slight modification of notation, and taking ¢ the interval between 
the blocks of the hypergeometrical as unity, 


8, =n {i+ je@-ph. a= 4n{1-Zeq-ph, 


b, = (b+ e(q—p)}, b. = $ [b—e(q— p)i, 
n+2m+ 2 es 
= an ; b = Ven? — pq. 


Now (iii) becomes (ix) on writing a for p and q, 2a for n, and n+m for m, 
whence 
n+mt+at+l 
e= — —— and s,=s =a, 
2a 
b=V(in+mt+at+lyY—a@, b=b,= $b. 
If a is small compared to n+m+1, 6 the range of the curve approaches 
n+m-+1,so that (x) may be written 


x“ % x y : 

Y = Yo (1 *. ni met) (1 + imine ) sornveiesoes (x1), 
where the origin is at the centre or mean. That is to say the form of distribution 
of the values of p+ tends to follow a Type I curve with range lying between — $ 
and n+m +4, of precisely the same form as (v) bis, the distribution of population 
proportions, 

(c) If the distribution of P can be represented by a high-order parabola 
Fy GP EP cc HOE vnccvivececenecececcses (xii), 
where the constants ¢), ¢, ... form a decreasing series, it can again be shown that 





ae ae ptr f 
the distribution of / a observed in samples tends to that of F (P)+. 
st 
* Biometrika, Vol. xu. p. 10 et seq. ; 
“i _ N ., atl - (a+1)(a+2) \ 
7 ot Geren Le m+1 («0 “Antm+2 n+ m +2) (n+m+3) *"/)* 


If the constants cy, ¢c;... decrease fairly rapidly this expression approximates but does not exactly 
correspond to the form 


; N Laid a . a? + 

a ee c antes Ceo eee 

In tm+1\oT nym @ (n+ m)? : 

which we should have if the distribution oe were a similar curve to (xii). The discrepancy will be 
greatest for small values of a=p +r. 
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We shall therefore expect that in general the distribution of (p+7)/(n+m) 
will give a fair approximation to the distribution of proportions or values of P in 
the populations sampled, the difference being greatest for small values of p+’. 





(a) The histopram represenis the observed distributions 
(b) © Expected fraquercies if F(P) is the U curve of equation (XIII) 


1200 fF (c)o ” ” > » » represented by the sloping lines of | 1200 
e equation (xv) e 
1000 Fj 1000 
ij @ e ou 
800 aS aim 800 


Med © 
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Fig. 2. Distribution of Frequencies of p+7r in 12,448 samples of 35 (made symmetrical). 


The actual distribution observed in the present experiment is shown in Fig. 2 
and the frequencies in Table II. They have been made symmetrical about 
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(p+r)/(n + m) =}, for asymmetry would have no meaning*. It will be seen at 
once that the distribution is not a horizontal straight line, and that the deviations 
from linearity are far greater and more systematic than could be expected to arise 
in random sampling from a series of populations in which the values of P were 


TABLE II. 
Frequencies of Observed and Theoretical Values of p+r. 




















| Observed | Theory | Theory ; Observed | Theory | Theory 

| P+” | Frequency | (b) (c) ptt Frequency (b) | (c) 
eet (Eten : P a a 
| 0 780 | 1166 | 809 9 630 628 681 
Ye 768 908 | 795 10 643 619 667 

2 821 811 | 781 11 670 612 653 

3 779 754 | 767 12 682 606 639 

J 792 717 752 13 668 601 626 

5 769 689 | 738 1h 616 598 615 

6 739 668 | 724 15 568 595 605 | 

7 727 652 | 709 16 524 593 598 | 

8 694 639 | 695 17 578 592 594 
| Totals | 12,448 | 12,448 | 12,448 




















distributed equally between 0 and 1. Now the material is in a sense composite, 
being collected from various distinct sources; the average value of p+?r from 
source (5) was very low, while that from (4) was probably slightly higher than that 
from (3). At the same time in collecting the material there was no doubt a half- 
conscious effort to choose the discriminating characters so as to spread out the 
values of p between 0 and 10. It cannot therefore be claimed without other 
evidence that the distribution of Fig. 2 represents the distribution of proportions 
that other statisticians might find in their practical experience. For the moment 
we must be content with putting this question: Given that the distribution of 
proportions among the populations sampled was in this particular series of 
experiments such as to cause the distribution of p +r observed, how far are the 
predictions of the simple form or a modified form of Bayes’ Theorem borne out ? 

5. Analysis of results. We may proceed in various ways. 

(a) Let us first compare the observed distributions of r in Table I with the 
theoretical distributions of r for given values of p calculated from the simple form 
of Bayes’ Theorem, the C,,, of Equation (ii). The theoretical frequencies are given 
in Table IIT. If the x? Test for Goodness of Fit is applied to the. corresponding 
columns of Tables I and III, we obtain the results given in Table IV+. There is 
great variety in the goodness of fit, the average value of P(y*) being ‘228. We 
can reach a measure of the closeness of correspondence of the whole of Table I to 


* In a series of propositions chosen completely at random we might expect theoretically to find 
symmetry, but in actual practice for convenience in counting the group frequency entered as p is 
usually the smaller of the two. For example if I find 1 bearded man and 19 beardless I shall enter p in 
the records as 1 rather than 19, for it will have been the beards that I have counted. 

+ Small frequencies were clubbed together so as to give no groups in the theoretical frequencies 
containing less than 10 observations. 
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Table III as follows. To compare the goodness of fit of N observed to NV theoretical 
frequencies when there are j linear relations that must hold among the frequencies, 
we must enter the Tables of Goodness of Fit with X? and WV’, where 


X? is the sum of all y's, 
N’=N-j+1. 

In the present case X* = 150°1603, V = 103, j = 11, V’ = 93*. Whence using 
the Tables of the Incomplete ['-Function it is found that P (x?) = 00014. That is 
to say the chance that the differences between Tables I and II taken as a whole 
are due to random sampling is extraordinarily small. This disagreement is not 
perhaps surprising if we remember how far the p + r distribution of Fig. 2 differs 
from a horizontal straight line. 


TABLE III. 


Theoretical Distribution of p and r among 12,448 double samples 


deduced from Bayes’ Theorem unmodified. 
y J 


Values of p. 





= 


Values of 7. 


10 
11 
12 
13 
ly 
15 


; : 
Totals | 1295°0| 1356:0} 1359-0 | 1229°0 
| | 1 


| 
75574 | 


| 323°7 


133°3 | 
52°5 | 


452°0 
| 


398°8 | ¢ 
253°8 | ¢ 


137°5 
66°5 | 
29°3 | 
118 
6°3 | 
| 





| ‘i PD. 
1| 124-9] 65-4] 33°8 
*2 | 234-1 | 158-1 | 101°5 
0 | 264-3 | 221-4 | 171°6 
"1 | 229-1 | 231-6 | 212-4 
7 | 165-9 | 198-5 | 212-4 
3 1043 | 145-6 | 179°7 
)| 57-9 93-3 | 131°8 
8| 286 | 52°8| 84:7 
8} 12°6| 26-4| 47-9 
| 73! 11-6] 23-7 

is pies 10 

; — 5 


| | 


|12110 





vy 








6 ‘ 
164) 71 
59°2 | 306 
1118-4] 715 
| 171-1 | 11971 
| 197-4 | 157°3 
191-1 | 173-0 
159°2 | 163°0 
115°7 | 133°3 
73°6 | 95:2 
7} 40°99! 59°3 
| 19°6 | 318 
8:0 | 14°5 
34 | 7:3 


| 
1215°0| l 1740) 1063-0 





| 
| 
| 
' 
| 
| 


| 1087-0! 955 








8 | 9 10 
334) — = 
170) 89) 4:5 | 
457! 235! 13-7 | 
871} 51-0) 33-5 
1307 | 865 63-9 
162°5 | 1211, 100-4 
172°3 | 144-2 133-9 
1583 | 1483 154°1 
126-6 | 132°7 | [154-1] 
88°1 103-2 | [133-9] 
52°9 | 69°2 | [100°4] | 
26°9| 39°3 [63-9] | 
111} 184 [33°5] 
5 8°7 | [13-7] | 
| Teo) 


| 


0 | [1008-0] | 





Where small tail frequencies have been grouped together, these figures 


are put below (or above) horizontal bars. 


(b) Take now the general form of C,,, of Equation (i) and put for F(P). 


=) 2(@+D) p, 
Y= Pat - 


(l1-— Py 


It has been shown above that an appropriate value for a can be obtained by 
fitting a Type I curve to the observed values of p+r (Table IL). Fitting by 


* j=11, corresponding to the 11 columns, p=0, 1, 2, 
and observation have been made to agree. 


... 10, for each of which tue totals of theory 
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TABLE IV. 


Groodness of Fit Tests. Observation 
and simple Bayes’ Theorem. 























p | x n’ P (x?) 
0 31806 6 | 673 
1 83146 | 7 ‘215 
2 | 16-0261 8 “025 
3 22-0819 9 “005 
4 | 90518 10 ‘433 
5 | 145416 ul 149 
6 192740 12 056 
y | 19-1419 1 038 
8 | 242553 12 ‘O12 
9 | 82975 11 “600 
10 59950 6 307 
| Totals | 150°1603 103 Average ‘228 





equating the 2nd and 4th moments but using no corrections for abruptness I find 
the equation referred to 17°5 as origin is 


a \ 72266 am \7'2266 ae 
Y=Yo (1 + i738) (1 - i738) saeaaser evant (xii). 


The range of this curve is rather too short, and the values of the constants could 
certainly be bettered by using abruptness coefficients. If the ends of the curve 
are fixed at — 17°5 and + 17°5, and a found from the 2nd moment only, we have 


a \7 "2039 a \ 72039 : 
Y= Yo (1 + i738) (1 - irs) edinepiainees (xiv). 


Taking a as —°2266 and making use of (viii), we can obtain the “ expected ” 
frequency distribution of p+7. This is given in the 3rd column of Table II and 
represented by black circles in Fig. 2. The correspondence with observation is 
not satisfactory at the ends; in fact a U-shaped curve does not provide a good fit 
because the frequencies of p+ 7 appear to be almost constant between 0 and 5. 
This feature will be referred to again in the second experiment. 

(c) A more satisfactory fit to the p +r distribution can be obtained from two 
sloping straight lines. Assuming that this distribution may be taken as an approxi- 
mation to F'(P), I obtained by the method of least squares the lines* 

y = 1191 182 — ‘764 728P for P= 0 to 4) 
y = 1191 182 — ‘764 728 (1 — P) for P=} to 1) 
which for short may be written as 
y=a-BP, y=a-—B(1-P). 


It is now necessary to insert this form of F'(P), 


siasesuaswed (xv), 


* The lines shown in Fig. 2 are those obtained directly from least squares, i.e. y =819°29 — 15°0279x 
and y=819°29 - 15-0279 (85-—.) referred to p+r=0 as origin. The P distribution is that of 2/35, 
multiplied by a constant to make the area under the lines between P=0 and 1 equal to unity. 
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(i) into the expression L, of (vii), in order to get back for purposes of com- 
parison to the frequency distribution of p +r; 
(ii) into C,,, of (i), to obtain the theoretical frequency curves for the values of 
r in the second samples. 
Lg becomes 
, n+m " is 
arin + ine a)! Af ere ey e— A) eP 


+[ Pe Pye (a— (1 -P)) ap 


(n+m)! 


(a)! (b)! \e [ P¢(1— PydP — a\ J, pen —pyaP +| ps a — Py ap} 


(writing n +m —a=b) 
gate {aB(u+1, b+1)— BB(a+2, b+1) 


(a)! (by! . 
i If. Pe(.— Py dP — | Pe (Py ap]. 


_ “a —Py* 


1 1 
Now [ pen — Py ap =| - | + 554 41 Po(l— Py dP 





~b4+1- 
= (3) ; Phent al j. P¢(1—P) dP. 
b+1 ae +1 
Hence substituting the factorial values of the B- functions, 
L =n a 7 Biat+l) | 
. n+m+1 (n+m+1)(n+m+4+2) 
+959) amen gh sist mappa 
Now [Pp (1— Py dP =(4)"""" b at 3+ = +3 al: Pe (1 — P) dP, 
and on continuing to integrate by parts becomes fin: lly 
a a(a—1) a(a—1).. 


1 
(4)"*™ +2 j 


+3 | t548* G43)04+5* TOG J. (n+ 


(b+3)(b+4).. .“e +m + 2)) 


1 
= (ite b+2 Ay +m, a> let us say. 


And so finally 


oe  .- B(a+1) 
"7 n+-m+1 (n+m+1)(n+m-+ 2) 
1)n+m-+2 (nm + m)t 1 a—b { Xvi 
+ B(S) (a)'(b+ D! ot FS an | ce (xvi). 
The series for Ay4,,,q Was computed numerically for n + m= 35 and a=0, 1, 2... 17. 
The corresponding values of Z, are given in the 4th column of Table IT and 


have been plotted in Fig. 2 as open circles. They agree far more closely with the 
observed values than those obtained from the U-curve, although the correspond- 
ence is not very exact. Applying the x* Test for Goodness of Fit, it is found 
that P (y*)= "02, or the chance is still much against the divergences being simply 
due to random sampling. 
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Inserting ni value of #'(P) into (i), we sel 


a m ' ptr (1 — P)t**(a—BP) dP + i, ‘P+ (1 — Pye (a- B(1—P))dP 


—_ [Pra —Py(a—BP)aP + [ipa Py'(a—B(.—P)) dP 


pr = 


For a given value of p the denominator will be constant, and we have therefore 
to consider the changes with 7 in 


1 4 1 
fe [ Prtr() = P)tts dP — B i Prtrn el we. Pyi+* +| Pprtr ral “a Pye ap} ? 
/0 J0 } 


which can be reduced to 


m! 
rls! 


_m! (p+r)i(q+s)! Lys, 

(n+ my)!" ris! e oy 
where L,,, is the expression of (xvi) with p+7 written for a. Thus for a given 
value of p, 

oa q ! 8 ! o- 

Cy,r 2 (p as ca Sits sextortee pe eee (xvii), 
and as LZ, has been calculated for all values of a from 0 to 17 (and consequently 
also for a=18 to 35) all the required values of C,,, can be readily computed. 
These provide the theoretical distributions for frequencies of 7 in second samples 
given in Table V. Testing the goodness of fit of the observed values of Table I 


TABLE V. Theoretical Distribution of p and r among 12,448 double samples on 
assumption that F'(P) is represented by two sloping straight lines. 
Values of p. 





EARS ES S. 


— | 





18-0 | 80} 38) — _ 
63:9 335 187) 9:9 4'9 
125°2| 765, 492 253, 145 
+ | 50-3 | 133°3 | 207-3 | 227°9 | 233°5 | 217-1 | 177-2 | 124°8 | 91-8, 536, 34-7 
ry 18°5 | 63°3 | 122-0 | 161°7 | 196°1 | 212-7 200°2 | 161°3 1348 891 65°] 
5 94! 273) 63:5, 99°6 140°8 | 176-2 189-7 173-7 | 164-2 | 122°5 | 100°6 

8 


| ss 5 | 6 7 s | 9 10 
! a a | a = _— 

| 0 | 764-7 | 463-3 | 263-0! 131-6! 69-9| 36-7 | 

| 2 | 821-9) 401-5 | 352-1 | 242-1 | 165-9 | 108-1 | 
| 2 | 180°2 | 250°8 | 302-3 | 268-1 | 227-7 | 179+ 


Lj 


Poe G6 — | 108) 296 54:2) 88-4|1265 154-8 | 160-3 171-0 1435 132°6 
- 7 57 | 124 263 49-0] 796 | 110-2! 128°8 | 154-6 145-9 151-6 
2 8 — | G8) 113) 24-0) 44:1! 68-8) 90-5 | 122-2 129°7 | [151-6] 
s 9 _ = — 62} 10°3| 21-4| 37:6! 55°7| 84:5 100°9 | [132-6] 
| 20 — | — | — | B4] 9-0) 17°9| 29°7| 50-7) 68-1! [1006] 
11 fa om 16) 10° 135) 25-9 391 | [65-1] | 
12 ae . ~ 67 10:9 186) [34-7] 
3 - —-~)}-—- =—- = 47) 88! [14°5] | 
| 1h | = ~f-— fm | = +] te =) ae | 
| 15 — — 


a 4 Tee? nde Maat See Soe 


| 
. = 
Totals 1295+ 0) ) 1356-0 1359°0| 1229°0 1211-0 1215°0| 1174:0| 1063-0 1087-0| 955-0 | [1008- 0} 








Where small tail frequencies have been grouped together, these figures 
are put below (or above) horizontal bars. 


* No difficulty arises here in transforming (i) into (i). 
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with these theoretical frequencies, we obtain the values of X? and P (x*) given in 
Table VI. The average value of P (y*) is here 338, and the general fitting is far 
more satisfactory than was found when assuming the distribution of chances among 
the sampled populations to be rectangular. There are only two of the observed 
distributions which give a really bad fit, those for p = 3 and 8, and taken separately 
probably none would be discarded as entirely unsatisfactory. As a group however 
the run of values of P (x?) is rather low, and if we apply the combined test (as on 
p- 403) we find 
X? = 1181350, N=103, j=11, N’=93, 


TABLE VI. 


Goodness of Fit Tests. Observation and Theory when 
F (P) is taken as two sloping lines. 











| p a n P (x) 
| @ “9575 Bo “915 
| 2 | 75981 = +268 
| 2 9°7163 8 “205 
3 | 15°1996 9 “ODD 
4 | -8*5659 10 “477 
5 | 11°7523 11 “302 
G | 13°7196 | 12 249 
y | 158142 11 102 
8 | 22-2513 12 022 
9 | 58525 12 883 
10 | 6°7077 6 | 243 
Totals | 11871350 | 103 Average *338 





whence (using the Tables of the Incomplete T-Function) it is found that 
P (x?) = 0346. The odds are still much against the difference between observation 
and theory arising simply from random sampling, but they are not weighted nearly 
so heavily as in the previous case when P (y’) = "00014. By fitting a more complex 
form of curve to the frequency distribution of p+ 7, we could certainly improve 
the agreement to some extent, but as it cannot be claimed that the exact form of 
this distribution with its peculiar sinuosities has a wider significance, the labour 
involved would not be worth while. 

(d) There is however another method by which we may consider the results. 
Returning to the analogy of the bags of balls, we expected in the long run after 
taking a very large number, NV, of random double samples, to find M, cases in which 
the first sample contained p black balls, and of these a proportion tending to C), , 
(of Equation (i)) in which the second sample of m contained r black balls, where 


ae 
M,=N | aig F(P) P? (1—PydP. 


The expected frequencies of cases in which p +7 has a fixed value, a, are then 


6... ee. Oe... HY ae (xviii), 








Values of p +7. 
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for p ranging from 0 to a, where it will be found that 
n!m! N 


M.C,,a-1 


~ (a—t)!(m—a + t)!t!(n—-t)!. 
1 


bs (a—t)!(m—a+t)!t!(n—2)! 

Hence the proportionate frequencies of the series (xviii) are quite independent 

of the form of F(P), the term containing this function only appearing as a common 
multiplying factor*. The a +1 observed frequencies in Table I, for which 


ptr=a, 


) 


Po (1 — Pyn+n-« FP) dP 


x a factor independent of t. 


should therefore correspond to the a +1 theoretical frequencies, 


RK 


K 


K 


iain secant , BUSTS Sa. ne 

Of (n)!(a)!(m—a)!’ 1!(n—1)!(a—1)!(m—a4l)!’ 21(n—2)! (a—2)! (m—at2)!"* 
Determining K to make the sums of these a+1 observed and theoretical 

frequencies agree, the figures given in Table VII were calculated for all the 


possible values of p and ». 


TABLE VII. 
Theoretical Distribution obtained by considering the proportionate frequency of p among 
double samples with a given p +r. 


Values of p. 





1 











o | 1 e|s| 4 | gie« riw2it « | 10 | u | 22 | 13 | 14 \'Totals 
= fa SESE See Ee oe Se aK Bebe ~aintlicesopall 
"es penn or wy eo a Be = 7800} * 
ee ee ee ee ee eee ee ee eee ee — | — |—| 70 
2 | 144-9 | 413°9 | 262-2) — —|—/;]/—]—}]—|-— ] — |—| 8210 
S| 54:1] 250°0| 3392/1307; — | — | —| —| -~|—|—]| —|— | — | f° 
y 7 | 137°6 | 301°8 | 258-7| 732) — | — | — | — | } — | — | | — |—| eee 
5 ‘1 | 64°7 | 204-8 | 283-5 | 172-2 | 36°7 | — | —|};-—-|- - | — | —| 7690) 
6 | 23! 27-3] 118-1 | 236-2 | 231-6 | 105-9] 176) — | cet Head lewal eat Oe 739°0 
7 | 11°5]| 61-7 | 168-2 | 238-°3|176°0| 629| 84| — | }—|—-|j-— | — | 727°0 
8 4-0]} 28-0 | 101°0 | 195-0 | 208-0 | 120-0} 343) 37) — | — | — | - - 694°0 
9 | — _ 12°0]| 50°9 | 129°8 | 188°8 | 157-4 | 72°6 | 4k od } - | — — | —_ 630°0 
10 | | 4°7]| 25-7] 849 | 163-1 | 185-3 | 123°5 | 46-3 | [95 | — |—|{- — 643°0 
11 | | — | - 13°5]) 50-0 | 124-6 | 186°9 | 169°9| 92-1) 28:3) [47/ — | — | — | — _ 670°0 
cl oe eee Be 5*1]) 25°5 | 81°6| 158°6 | 190°3 | 140°6 | 62°5| 15°8| (20) — — | — | 6820 
si — | -—- | | — "| 12-6] 45-2 | 112-9 | 175-6 | 171-1 | 103-7 | 38 0 tS9| — jf = |= 
lh Dasa % “oli 4°3]| 20:6) 66-2 | 1324 | 167-4 | 133-9 67-0 | 203) [39 | — | 616°0 | 
15 | — | | 9-4]) 33-9) 87-2| 141-8] 147-0} 97-0! 40-1 | (116) — | — | 568-0) 
| 16 | — — 30]] 15°0| 50-1 | 104°6| 139-4 | 119-4{ 65°1| 22-2| [5-2 | — | 524-0} 
| 17 | ae ie 7°7]) 29°7| 80-3 | 137-7 | 151-4 107-1 | 48-2 | 13-5 |[2-4) 5780 | 
| (12,448 | 









































Where small tail frequencies in the horizontal rows have been grouped together, 
these figures are put outside a square bracket. 
* In the theory of probability 
a! (m+n-—a)! (m+n)! 
(a—t)!t! (n—t)! (m-—a+t)!/ (m)! (n)! 
is the chance of drawing t marked individuals in a sample of n tal.en at random from a larger 
group m+n which contains a marked individuals. And as we should expect, within the double 


samples in which p+r=a, the frequency of samples with different values of p from 0 to a follows 
this law. 


\ 











| Totals 
| } 
| 780-0 | 
768-0 
821°0 
| 779°0 
792°0 | 
769°0 | 
739°0 
727°0 | 
| 694:0 | 
630°0 
643-0 
670°0 
| 682°0 
668°0 
| 616-0 | 
| 568-0 





on 
~I 
2 
o 
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Applying the x? test, we find the results given in Table VIII; the average 


value of P(x’) is 358. Here 


X?= 1060789, N=100, j;=17, N’=84, 


from which P (x?) = 045-. 


TABLE VIII. 


Goodness of Fit Tests ; applied to observation and theory 
among double samples with the same value of p+ r. 














ptr | x? | n’ | P (x) 
| 2 | om | 2 31 
: 2 54267 | 3 07 
3 13590 | 4 ‘71 
| 4 50774 5 | “28 
= 66918 | 5 16 
| 6 2°8315 6 ‘73 
me 35417 | 6 “62 
| 8 68965 | 6 23 
ye. 10°1552 7 "12 
| 10 17°3595 7 ‘01 
| 11 89641 7 18 
| #2 74287 7 “29 
13 62253 7 “40 
1h 1-9983 7 “92 
15 5°7707 7 “45 
16 5°7921 7 “45 
17 9°5182 7 “15 
Totals 106°0789 100 Average °358 





This final test appears to avoid the introduction of the uncertain function F(P), 
and we have obtained by it a measure of the closeness of fit of observation to 
theory which is but very little higher than we obtained on the assumption that 
F(P) could be represented by two sloping straight lines. This seems to suggest 
that no further adjustment of F(P) will materially improve the fit, so that there 
appears detinitely to be some lack of correspondence between observation and 
theory, a divergence not very serious perhaps, but of the order of ‘05 on the “P, y*” 
scale. This can be located more clearly by a comparison of the moments of the 
observed frequency distributions with the theoretical values obtained on various 
hypotheses. 


6. Table IX contains the values of the Mean, Standard Deviation and the 
constants 8, and 8, of the frequency distributions of values of r found in samples 
of m following p in samples of n. They have been calculated, 

(1) On the simpie Bayes’ hypothesis, from relations (iv). 

(2) On the assumption that #’(P) is of form (v) with a= —-25, by substituting 
in (iv) 

pta for p, q+a for g and n+ 2a for n. 


Biometrika xvi 
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(3) On the assumption that /'(P) is represented by the two straight lines of 
(xv); the moments were here computed from the frequencies given in Table V. 

(4) From the observed frequencies given in Table I. 

No corrections for grouping have been used in any of the four cases. 


First compare the series of moments calculated on the hypothesis (2)—where 
F(P) is assumed to be a U-curve—with those obtained from the simple Bayes’ 
hypothesis (1). It will be seen that for p=0 the differences are large, but as p 
increases the two sets of values more and more nearly agree, until when p=3 the 
differences, except in the case of the mean, are not greater than the probable 
errors*. It is of course the steep rise of the U-curve distribution of F(P) that 
modifies the moments at p=0 so greatly. The differences in (2) are in the sense 
of a lower mean, a smaller standard deviation and greater skewness. It has been 
shown + that to a first order 


, , a j a 
wlan (L4 poy), mmm (1 Sa) 

That is to say the difference between the moments of (1) and (2) does not 
depend primarily on the values of either x or m. For example, if the size of the 
first sample were 100 instead of 20, while the second sample remains at 15, 
the following moments will be found for the distribution in second samples when 
p=s: 


Mean o By By 
a=0 588 ‘801 2°13 5°35 
a= —‘25 "554 ‘779 2°28 5°53. 


The percentage differences will be found here to be only slightly greater than 
between the corresponding moments in Table IX, for n = 20 and p=3; the average 
in the one case is 47 /, in the other 42°. Beyond p=3, the differences will 
steadily decrease. 

What conclusions can be drawn from this? It shows that whether the distri- 
bution of “chances” among the populations sampled is rectangular or follows the 
U-curve of (v) with an a of about —*25, the frequency curves for values of 7 in 
second samples, as judged by the values of their first four moments, do not differ 
greatly if p be larger than 3 or 4. That is to say if some uncertainty exists 
regarding the form of F(P), then the larger the size of the first sample, n, is, the 
smaller is the fraction of the range of possible values of p affected by this uncertainty. 
Further, even if a U-curve does not provide a very satisfactory fit, it appears likely 
that the same general rule will hold for any other curve not differing very 
markedly from it (as, for example. the two sloping straight lines of Equations 
(xv)). This may appear a vague and unsatisfactory position from the standpoint 
of strict analysis, but it will be seen later that it is of considerable value from the 
practical point of view. 

* The probable errors given are calculated for the moments found on hypothesis (3), where the size 


of sample is in each case that actually observed, i.e. of the order of 1000. 
{+ On p. 396. 











TABLE IX. Moments of Distributions 
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in second samples on various hypotheses. 








(3) # 





( 
(3) F' (P), two sloping lines 
(4) Observation a 


p=2 

(1) F'(P), constant 
®) F (P), a U-curve 23 
(3) F'(P), two sloping lines 
(4) Observation 


eee 


p=0 Mean 
(1) F(P), constant 682 
(2) Ars a U-curve 523 

(P), two sloping lines 663 + -019 

(4) Observation “641 

p=1 
(1) F(P), constant 1-364 
2) Ft), a U-curve 1-221 


1°327 + 025 
1°322 


2-045 
1°919 
1991 + -030 
1-899 





p=3 
(1) F (P), constant 
(2) F(P), a U-curve : 
(3) #(P), two sloping lines 
(4) Observation 


p=4 
|) F C) constant 
(2) F(P), a U-curve 
(3) F'(P), two sloping lines 
(4) Observation 


p=d 

(1) F'(P), constant 

(2) F(P), a U-curve " 
(3) F'(P), two sloping lines 
(4) Observation . 


p=6 
(1) #'(P), constant 
(2) F(P), a U-curve 
(3) F'(P), two sloping lines 
(4) Observation 


p=7 
(1) #'(P), constant 
(2) F’'(P), a U-curve 
(3) F'(P), two sloping 
(4) Observation 


lines 


p=8 
(1) F'(P), constant 
(2) F(P), a U-curve 
(3) F'(P), two sloping 
(4) Observation 


lines 





‘at 
F(P ) ig set 
F(P), a U-curve — 
F (Pp), rind sloping lines 
( 


(1) 
(2 
(3 
(4 bservation 


) 
) 
) 
) 
(1) A (P), constant 
(2) F(P), a U-curve 
(3 


F(P), two sloping lines 
4) Observation 








to be bo bo 
Ee -1 


3°409 
3°314 
3°323 + °039 
3°353 


4091 
4-012 
3°994 + 042 
3-989 


4°773 
4°709 
4°670 + 045 
4°607 


5°455 
5*407 
5°354 + 049 
5272 


6°136 
6°105 
6°057 + *050 
6°029 


6°818 
6°802 
6°772 + °054 
6°640 


“500 
“500 
500+ °075 
500] 


J +1 +1 +! 


= 


























o By Be 
1-023 3°71 7°68 
905 5:07 9°52 
1-003 + -013 3°70 7°56 
“989 3°66 711 
1-412 1°58 4°81 
1°349 1°85 5°18 
1°389 +018 1614-16 4°84 + °37 
1°345 1°36 4°38 
1-686 “87 3°86 
1648 “98 4-00 
1°658 + ‘021 *88+°10 3°84+4°21 
1-619 115 4°86 
1°895 "DB 3°40 
1°872 “58 3°47 
1°867 + °025 "55 + °07 3°424°17 
1-734 “39 3°15 
2-059 “33 3°13 
2-046 “36 3°16 
2-031 + 028 *B44°05 3°134°13 
2-092 “53 3°44 
2°188 20 2°96 
2-183 “22 2-98 
| 2°167 + 030 22+ -O4 3°00+°12 
2-236 18 2°87 
2-288 12 2°85 
2-289 13 2°85 
2°276 + ‘032 14+ °03 2°88+°10 
2-282 15 2°86 
2°363 “06 2°77 
2°368 O07 2°77 
2°360 + °035 07 + 02 2°78 +08 
2-251 02 2°84 
2°415 ‘O03 2°72 
| 2-423 03 2-72 
| 2430+ -035 04+°01 2°73 4°07 
| 3 2-354 13 2°77 
| 
| 
| 2°446 ‘Ol 2-70 
| 2°456 ‘01 2°69 
| 2-471+-038 -008 + *003 2°67 +07 
| 2°449 029 2°76 
| 
| | 
| 2°456 00 | 2°69 
| 2-466 00 2°68 
2°488 + -053 “00 2°67 + ‘09 
2°411 [-00] 2°71 
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Returning again to Table IX, it is seen that the values of o, 8, and B, obtained 
on hypothesis (3)—that F(P) is two sloping straight lines—differ only very slightly 
from those of the simple Bayes’ Theorem, (1). The Means of (3) are less than 
those of (1) by a varying amount which reaches a maximum of ‘103 when p= 6, 
but has a maximum percentage value of 34 7 when p=0. As might be expected 
from the results of the x? Goodness of Fit Test, the moments of the observations, 
(4), agree on the whole most closely with the moments of (3). The extent and 


TABLE X. 
Deviations, Moments of observed distributions—Moments of 
theoretical distributions (F (P) two sloping lines), in 
terms of probable errors of the latter. 








| l | 

| P Mean | o | Bi | Be | 
0 —115 | -1:°02 | —-—(small)) —(small) 

1 7 | -o93 | =9-44 -16 | —13 | 

| 2 | -305 | -1°80 +2°7 +49 | 

| 3 | -367 | -5°25 | -2°2 -16 | 
4 | +077 | +218 | +38 | 42:4 

| 5 | -o-12 +231 | -10 | -10 | 

| 6 | —141 | 4021 | +06 -02 | 
7 | -1¢8 | -3-17 26 +07 | 
8 —-0°55 | —2°17 +9°6 | +0°7 | 
9 | 2-45 —0°58 +68 +1°3 

| 10 | — | 146 ); — +0°4 | 





sign of the difference is summarised in Table X, where I have given the differences, 
(4) —(3), in terms of the probable errors of (3). We see: 

(a) The means of the observed distributions show a systematic defect (there is 
one exception for p= 4), which amounts in the average to 1°35 times the probable 
error. 

(b) The observed standard deviations are also smaller than the theoretical 
values in all cases except three. 

(c) The observed values of 8, and 8, are sometimes too large and sometimes 
too small, but do not appear to show any systematic divergence. In particular, 
for low values of p, the skewness of the observed curves is not significantly below 
the theoretical value*. 

(d) There are considerably more differences of over 2 to 3 times the probable 
error than we should expect to find if the variations were simply the result of 
random sampling. 

The systematic difference in mean and standard deviation can only, I think, 
be in small part due to an inadequacy of the lines of Equations (xv) in representing 


* For p=0 the values of 6; and {, fall far outside the range of the tables for probable errors, but 


I think that the observed differences (—-04 for 8, and —-45 for 8.) must be considerably less than the 
probable errors. 
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the distribution F(P) of the populations sampled, for it has been seen that the 
test which avoided the assumptions regarding the form of this function gave a 
value of P(x*) of only 045. A possible cause may lie in the process of sampling. 
If there had been a certain degree of instability among the sampled populations, 
there would have been a tendency for the proportion r/m to fall somewhat nearer 
the modal value of #'(P)—that is in general to have a lower value*—than if the 
constitution of the population had remained constant for both samples. This 
would result in a lower mean, and also a smaller standard deviation in the second 
sample than the theory would predict. 


But if it is necessary to admit a certain amount of discordance between ob- 
servation and theory, we should be guilty of a lack in sense of proportion if we 
were to condemn Bayes’ Theorem on these grounds. A glance at the distributions 
of Table I shows that in a very real sense there is law and order among the figures. 
With the observed series of distributions changing gradually from J-shaped to 
symmetrical curves as p increases, it would be foolish to insist on some theoretical 
grounds that it was useless to attempt to give numerical expression to the pre- 
diction of future events. There is clearly regularity where some of the critics 
would apparently expect an irregular jumble of figures, and the preceding analysis 
has shown that observation and theory are not so very discordant. The greatest 
error in the value of a mean is only 5°/, (in the two cases, p=2 and 3), while the 
variation among the observed values of 7 is more often than not less than that 


predicted by theory. 


It must not of course be forgotten that the best fit has only been obtained by 
making use of an a posteriori curve, F(P); we shall see more clearly, after dealing 
with the second series of experimental samplings, how far it is possible to proceed 
without this help. 


7. Series II. The first series of experiments have shown how far, if repeated 
samples of a fixed size are taken, the resulting distributions of r for a given p 
follow the theoretical rules, and the results have been on the whole encouraging. 
In actual practice, however, the statistician is not dealing repeatedly with single 
values of n and m; indeed the same values may never recur in all his experience. 
The position is the same in almost all statistical problems, whether we are making 
use of the criterion of the simple probable error or applying the y* Test for Good- 
ness of Fit. Theory gives the result to be expected in repeated tests under the 
same conditions, while in practice the tests are required in a number of single 
cases under different conditions. In our particular problem, the statistician wishes 
to know whether if he applies Bayes’ Theorem? in some 50 or 100 different 
problems, the range for 7 that he predicts will not be exceeded in his second 
samples more often than theory expects. If the theory lays down that in k°/, 
of cases the number of “black balls” in the second sample of m should lie between 
limits L(p, n,m) and L’(p, n,m), then he wishes to have some confidence in 

* In view of the U- or V-shaped form of this function F (P). 


{ Smaller, because the lower the P of the population, the less the o. 
+ Assuming F'(P) constant or to have some other definite value determined a priori. 
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expecting that out of NV problems with differing values of p, n and m which his 
work may bring before him, he will be correct in predicting that r lies between L 
and L’ on 4, Nk +e occasions, where ¢ is a small quantity, such that e/N approaches 
zero as N increases. Suppose that in each of some 100 problems he calculates the 
range within which 99 yd of the 7r’s of second samples should lie, then if he accepts 
these ranges for purposes of prediction will he be only in error in one or about one 
case out of the hundred? It appears to me that it is only by this reference to the 
results of a number of samples that “probability limits ” and “odds for” or “against” 
can be given any useful interpretation, and this is the way that it is proposed to 
look at the problem in dealing with the second experiment. I shall make use of 
Bayes’ Theorem, assuming F'(P) to be constant, to calculate in each of 300 cases 
(in which » and m differ) the limits within which, (a) 90°/,, (b) 99°/,, of the 
values of 7, should fall if repeated samples were taken, then apply these limits to 
the single instances and so find out whether in the long run the theory plays false. 


The following notes indicate the chief points observed in collecting the 300 
samples : 

(1) As far as was humanly possible the propositions or subjects for recording 
were chosen in a haphazard manner without forethought as to possible values of P. 
To assist in making the selection as unprejudiced as possible, all but about 10 °/, 
of the propositions were noted down during the course of a fortnight before the 
recording was begun, and the remaining 30 were all fixed upon before any attempi 
was made to reduce the data. 


(2) The values of x and m varied from 15 to 600 and were combined with 
great variety ; m was often taken larger than n*, so that the second sample was to 
be predicted from a smaller first sample. 

(3) Particulars of twenty of the double samples are given in Table XVIII and 
are discussed later. They were chosen randomly from the 300, viz. their numbers 
in the order of record were 15, 30, 45, ..., 300. 

(4) In the majority of cases n and m were taken as multiples of 10; this made 
no difference to the value of the experiment, but considerably eased the computing. 
As a result certain combinations of n and m appeared a number of times. 

(5) The recording was much simplified by using one of Galton’s “ Pocket 
Recorders,” several of which are preserved in the Galton Laboratory. That used 
was a small instrument with 5 keys each working a numbered dial, fitting the fingers 
and thumb of the hand, which can be conveniently carried and worked un- 
observed in the pocket. It was possible to take as many as three different 
samples at the same time. 

The distribution for the 300 samples of the ratios (p+7r)/(m+n) is shown in 
Fig. 3+. It is seen again to be of U- or V-shape with the same suggestion of a 
maximum between the proportions ‘1 and ‘2 as in the previous experiment. The 
* In 129 cases, 

+ It has been made symmetrical about 0°5 for the same reason as before. 
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form of this divergence from linearity in the F'(P) distribution will be considered 
later, but we will proceed with the test on the assumption (afterwards shown to be 
justified) that this divergence is of no practical importance. 








1 (a) The hislofram represents the obserued frequencies } 
607 (b) - — — is.the curve Y= 22-80 x ~ 1336 (1-2) "1336 460 
, (c) » » ” y = 9-8376 x --6620 (1 heed 
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Fig. 3. Distribution of Frequencies of E 4 in 300 samples (made symmetrical). 
“= 


After collecting the samples the following method of analysis was carried out. 
For each double sample the position of the Mean, and the values of o, 8, and B, 
for the frequency distribution of 7 in second samples were calculated from the 
relations (iv); this was a straightforward if very lengthy piece of computation. 
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For each of these frequency curves the position of the ordinates bounding what 
may be termed the central 90 °/, and central 99 °/, of frequency* were obtained 
by help of the Tables in the Appendix. These limits were chosen partly to form 
a test for the present experiment, but also because it seemed likely that they 
would be of value in other problems. Had tables of the Incomplete Beta- 
Function been available any percentile limits could have been chosen at will 
and the labour involved in calculating these four tables would have been 
avoided. The method by which the tables were computed is described below, but 
it is of some interest to consider the range which they are needed to cover. 
Taking the two last equations of (iv) it is found on eliminating p and q that 
B, and B, satisfy the linear relation, 
n+4(n+1) (n+ 2)+ 6m(m+n-+ 2) 


B= (n+ 2m + 2) —a + 





n+3 2(n+2) ) 
n+5 | “= 

That is to say, for a given size of the two samples the point in the A,, PB, 
diagram representing the distributions in second samples lies on a straight line, 
its distance from the f, axis increasing as p/n decreases from 0°5 to 0:0. Now 
write m= nX in (xix); then 








g.%t* 4 DO Dt ECA ED 9 Bt a) 2(n+2)  ) 
an +5 (n(1 + 2d) + 2) rT nt5)°) md(n(1+r)+2))’ 
or approximately if n is not too small, 
_1+6n(1 +2) 2 
B, = al + Dr)? B, + 3 XQ +a)n ecccccccccccccccce (xx). 


First consider the slope of this line, 
ies 1+ 6A (1+)) ‘vin da _ 2 
=~ aayp 7 BB aN Day 
Thus a@ increases with X but at a decreasing rate; we have, 
(1) X=0, or m is very small compared to n; then (xx) tends to 
2 
B.— Bi -3+—=0, 
me 
which is the line giving curves corresponding to the binomial (p + q)”. 


(2) A=1, or m=n; then a = 43 or little less than 3 the slope of the Type III 
line. 


(3) Ao, or m is very large compared with n; then in the limit (xx) 
becomes the Type III line 
28. — 38, —6 =0. 
The constant term in (xx), or 
2 
b=3 —-—__— 
ACL +A)xn’ 


* Te. the ordinates cutting off (1) the 5:0 °/, tails, (2) the 0-5 °/, tails of the area under the 
frequency curve. 
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shows that the line cuts the 8, axis where 8,<3 and the point of intersection 
tends to 3, the Gaussian Point, as m is made large. b will decrease as m and n 
decrease; taking as a lower limit for satisfactory fit of curve to histogram 
n= m= 15*, we find that (xix) cuts the 8, axis where 8, = 2°64. 

Thus for the Bayes’ hypergeometrical series the point §,, 8. will fall within 


outside the area between 
28, — 38, —5=0, 
28. — 38, —6=0. 


These limits were taken into account in forming the Tables of the Appendix. 
The actual distribution of values of 8,, 8. found in the 300 samples is shown 


the Type I area and in the cases likely to be mét with will probably not fall 


in grouped form in Table XI. The majority of points lie close to the 8, axis but 








TABLE XI. Distribution of B, and B,. 
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| Totals Ji93|39|21/12/5/4/1/5!2/313! 2 —|1 —|2|4]-—j|- 1 | 298 


The Table does 


and Biometricians, Table XLVIII). 
hypergeometrical series for a wide range of combinations of x and m. 








not include the two outlying observations + 








(8: =6°07, 8,=10°89. 
\8,=9°58, By=14°35. 


* For lower values of x and m Greenwood’s tables will be found very useful (Tables for Statisticians 


These give the percentage frequencies of the actual terms of the 
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a long tail runs out across the sheet representing distributions of increasing skew- 
ness; these are of course .the distributions in which p/n was 0, or very small. 
The correlation between the 298 values* of 8, and f, is + °9921 and the regression 
line of 8, on B, is 

B. = 142228, + TEBE ..........0..rcereecseeece (xxi) 


a line sloping gradually away from the Type III line as £, increases, and drawn in 


Figure 5 (p. 437). 


The 90 °/, and 99 °/, limits of 7 are those within which we should expect to 
find corresponding frequencies to fall, if a great number of series of sampling 
experiments were carried out. For such series of 300, the mean number of cases 
in which 7 should fall within the 90 °/, limits is 


300 x 29 = 270, 


100 
and the variation of the number in individual series about this is given by the 
terms of the binomial+, (;;+-%;), or represented by a standard deviation of 
5°20. Similarly for the 99 °/, limits the mean number is 297 and the standard 
deviation of the binomial, 1°72. 


(a) We may use the Tables of the Appendix. It will then be found for the 
actual observations that 


271 lay within the central 90 °/, limits, 
294 ” ” ” 99 3 ” 


The first of these numbers is in excellent agreement with theory and the second 
is not unexpectedly far out, for the binomial (;4,5 + 2%)” is very skew and the 
observed number, 294, is only at a distance of 1°74 times the standard deviation 
from the mean, 297, in the direction of the extended tail f. 


Consider now the result of applying two other more approximate tests for 
predicting the range of r in second samples: 


(b) Using the Mean and Standard Deviation of Bayes’ Theorem, but taking 
the curve of distribution to be always Normal. If y,/ and o are the first two 
moments calculated as in (iv), 


{ne central 90 °/, will lie between ordinates at v,’ + 16449 o, 
y Sea og ; » i + 25758 of. 


* In this calculation two extreme values were omitted, viz. 8,=6°07, By=10°89 and B,=9°58, 
By= 14°35. 

+ As there is no correlation between the different double samples the chance of obtaining a given 
number of r’s within the limits is the same as of drawing that number of black balls from a bag 
containing black to white in the ratio of (1) 90 to 10 and (2) 99 to 1. 

{ This binomial has for moment coefficients 8,=-323, 82.=3°317, from which we find from the 
Tables in the Appendix that its central 90 °/, of area lies between ordinates at —1:46¢ and +1:80¢ 
from the mean. Owing to the abrupt beginning of the binomial the value —1:46¢ taken from the 
Type I curve may not be very satisfactory, but the deviation towards the extended tail should be 
accurate. 

§ These are the deviates of the Normal curve corresponding to } (1+a)=-95 and -995. 
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(c) Using the very crude values for Mean and Standard Deviation which are 
often employed, where the first sample is treated as indefinitely great, 
—™P [mpg 


V7} = a ’ ay / 9 
n uw 


, 
and again taking the curve of distribution to be always Normal. Here the central 
areas will be limited as in case (b), if we substitute ,v,’ for v,', o for o. 

These two methods of approximation have been discussed by K. Pearson in the 
earlier papers on the subject (Phil. Mag. 1907, Vol. x11. pp. 365—378, and Bio- 
metrika, Vol, X11. p. 12). The numbers of observations falling outside the different 
ranges are summarised in Table XII. It will be seen that there are actually fewer 


TABLE XIL. 


Frequencies of values of r outside certain ranges. 




















| 90 per cent. | 99 per cent. | 
| Range Range | 
Se ee : pb ee aaa & 7 
. : , Mean 30 3 
| Frequencies to be expected in 300 samples | Standard Deviation 5-20 | 1:72 | 
| ‘«) Complete Bayes’ 2s 
| Frequencies observed with {\ ) « omplete Bayes Theorem - P : 6 | 
limits fixed by using + (b) vy and « and Normal Curve 24 ) 
| ae 7 \(e) ov; and o, and Normal Curve 88 36 | 





values of 7+ lying outside the limits determined as in (6), than with the complete 
Bayes’ Theorem constants as in (a). This does not of course mean that (b) is more 
satisfactory than (a), since for the 90°/, limits the frequency 29 (a) is nearer the 
expected value 30 than the 24 (6); in tact by using the skew curves it is possible 
to predict the range within slightly narrower limits than with the normal curves, 
as can be seen from Tables (1) and (2) of the Appendix*. The difference between 
§ and 5 cases lying outside the 99 °/, limits is of course quite insignificant. The 
figures in the last row of the Table show how very far out we should be in relying 
on the hypothesis (c); instead of the expected 3 cases in which the observed 
value of 7 should lie outside the 99 °/, limits we find 36! This difference is 
largely due to the total inadequacy of this method of prediction when the size of 
the second sample is greater than that of the first. 

These results seem to me to bring out remarkably well the value of Bayes’ 
Theorem; to have found in 300 samples only 6 in which r falls outside the 
predicted limits when we were certainly prepared to find 3, has been seen not to 
be an improbable result judged on the basis of probable errors calculated on the 
assumption of perfect sampling. And when we are not dealing with bags of balls 
but have taken 300 samples from less clearly defined populations, the result is 
certainly satisfactory. The difference between the findings on hypotheses (a) 


© ei uiiiae Be 8, =0°0, B.=3:°0, the total range is (1°64 +164) ¢=3:28., 
, ple, B,=1:2, B,=4°6, the total range is (1°26+1°91) ¢=3'17c. 


For the 99°/, range the position is reversed. 
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and (c) serves to emphasise this point, and shows that it cannot be argued that 
any form of approximate rule would lead to as good results. 

There is however a more stringent method of analysing the results of these 300 
samples. The second samples of m may be considered as 300 samples each drawn 
at random from a different distribution whose moments are known—i.e. from the 
300 theoretical distributions of r in second samples. Suppose in general that a 
sample of NV has been drawn, one individual taken randomly from each of NV 
frequency distributions represented by 


Fi (a), f(b), Fs (0) «+00 , 
where there is no correlation between the variates a, b, ¢,.... Further, suppose 
that for each distribution the variate is measured from the mean of that curve, 
and that the variate scale is such that the standard deviation is unity. Let 
square brackets imply a mean value for a particular distribution, and the sign S, 
a summation for all NV distributions. 


Then {as]}=0 and [a,Z]=1. 
A particular sample will contain N individuals with variate values, 
as, b:,¢,, ... ete. and its qth moment will be 


1 
M, = y (as! + bf + me 


If repeated samples of N are taken in the same manner*, then the mean 
value of M, among these samples will be 


M,= +S (Lat) = +8 OS EEE RIOR ‘ (xxii), 


while the square of the standard deviation of Mq will be 
== (M7) = (M,) 


. ’ 1 
= {S Lage] + 2S’ [a2 b,2]} — nc {S (atta) }?, 


where S’ indicates the sum for all possible different combinations of the variates, 


a and b,a and ¢, b and ¢, ete.; there will be — t) such combinations. As 


there is no correlation between a,? and b,’, it follows that 
[ as! b,*] = [ as? [b,7] = akkq x bq: 
2S’ 2 


Therefore Ww: [a,4b,7| = WN: S’ (aby X vbty) 


1 at P ‘ 
= N:? (ably + bbq v <s . = N? (apg si bg +.. ) 


1 ee 
- Ww {S (apty)}" ~ VW \S (aby’)}- 


* One individual from each distribution, where the total frequency of each is supposed infinite. 
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Hence == : S (abtoy) — * RO skis sv ex ceccarnten ...(Xxiii), 
where y may now be given the special values 1, 2, 3 and 4. 
In view of the initial conditions, 
at, =[a.]=0, qu,=[a,]=1, 
abs = VaR; abs = aBe; 


aps = Bs aps = abe, 
so that we obtain from (xxii) and (xxiii), 
- 1 ) 
yw Se 
M, = 0, . 3 = VN 
= 1 
a= 1. >. = — (Mean P,-1)3 
=F B,-1) 


venul (xxiv), 


M, = Mean VB, = x (Mean 8,— Mean B,)* 
\ 





M,= Mean P,, >= 7 (Mean §,— Mean B2)3 


/ 


where I have written “ Mean” for 7s ( ), implying the mean value of a frequency 
constant among the NV distributions. 

The values of 8, and 8, having been calculated already it was easy to obtain 
the values of M, and =, and to compare with these the observed values of M,, M,, 
M;, M,. It was first found from the original frequencies (which are grouped in 
Table XII) that 

Mean 8, ='4011, Mean 8, = 3°4388, 

Mean VB, = "4562, Mean 8.2 = 13-2418. 
And then approximately, 

Mean 8,= 25°71, Mean 8, = 306, 

which are the values of 8, and §, corresponding to 8, = ‘4011, 8, = 3°4388 (or the 
mean values) in Pearson Type curves*. From these M, and =, were computed as 
shown in Table XIII. The observed values of M, were calculated from the frequency 
distribution of Table XIV, formed by entering as variate each of the 300 values of 


, 








r—v ; 
z= ——", where r is as usual the observed number of “ marked’ 
o 


’ individuals in 
the second sample, and v,’ and o the mean and standard deviation of the corre- 
sponding distribution of Bayes’ Theorem. J,, M,, M, and M, are given in Table 
XIII, both for the 300 observations and for 299, the latter excluding one, No. 257, 
in which the deviation of r from the mean was 4°8 times the standard deviation. 


* Interpolated from a new Table of the higher 8’s computed by Professor Yasukawa, which will 
shortly be published. 
| + For No. 257, p and r were the number of persons in random samples from ‘‘Who’s Who,” 1899, 
who were authors (as shown by quoting publications). A first sample gave 86 out of 200, a second only 
14 out of 100. Possibly the pocket recorder may have been in error, but the observation was accepted 
at the time of record as presumably accurate, and cannot be altogether rejected. 
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TABLE XIII, 
Moments of Distribution of z on Bayes’ H es 





| Mean cubes 1 in repeated Observed values of M, 


series of — 


| 


| 
4 | For the 300 samples | 


| 

| 

es | "| 

Bw M, | | For 299 samples | 

Rom a a m _— = ma 
| | 

| 1 “0000 0577 — ‘0173 — ‘0010 
2 i-0000 0902 1-0929 | 1-0163 

} “4562 *2869 | — ‘2388 | + *1538 | 

j 3°4388 | 9879 =| 4°7817 | 2°8700 


TABLE XIV. 


, 
bo ci ols -v 
Frequency Distribution of z= ' observed wmong 


the 300 double Samples. 


Limits of z | Frequency Limits of z | Frequency 
—5°0 to --4°8 | +0°0 to +0°2 18 
: : : +0°2 ,, +0°4 17 
-3:0 to -2°8 | oe +0°4 | +06 18 
—-2°8 ,, —2°6 | 2 +0°6 ,, +0°8 | 15 
—-26 , —2°4 : +0°8 ,, +1°0 | 17 
—2°4,, —2°2 1 +1°0 ,, +1°2 18 
22 ,, -—2°0 3 +12 ,, +1°4 17 
~$0, -1¢ +14, +1°6 4 
—1°8 ,, -—1°6 1 +1°6 ,, +1°8 6 
-16,, —1°4 10 +1°8 ,, +2°0 l 
i 1-2 14 +2°0 ,, +2°2 3 
-1:72 ,, —1°0 16 +2°2 ,, +2°4 2 
1:0 ,, —0°8 14 4+2°4 , +2°6 2 
-0'8 ,, —0°6 20 4+2°6 , +2°8 1 
—0°6 ,, —O°4 28 +2°8to +3°0 | I 
~0°4 ,, —O°2 24 | 
-0°2 to —0°0 22 Jie "Fee 
Total des 300 





M, and WM, are very close to M, and M.,, the “expected” values, if we omit the 
double sample No. 257, and if it is included M.- M, is still only of the same order 
of the standard deviation, or 1} times the probable error. With only 300 samples 
the values of ¥, and ¥, are so large that little useful information can be drawn 
from the observed values of M, and M,. The inclusion of the single observation 
with a deviation of 4’8o changes the positive skewness to negative, and increases 
M, from 2°87 to 4°78, but little importance can be attached to this although it 
suggests that the observed distribution of Table XIV is not as skew as we should 
have anticipated. The more skew the theoretical distribution of r, the more skew 
should be the distribution of z, Taking only those 47 samples in which the 8, of 
the r distribution was greater than 0°6, results were obtained as shown in Table XV. 
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TABLE XV. 


Moments of Distribution of z 
where 8B, > 0°6. 


in Samples 





Mean values in sepented 





f it 
ah oe Observed value of 
se = | | M, in 47 samples 
M, | Ze | 
me ie iddes: Aas 
1 | 0-000 "146 — ‘083 
: 1-000 307 ‘994 
o 1°28 1°27 * + °47 
4 5°42 6°16 * 3-24 








As far as any conclusions can be drawn from a result in which the values of =, 
are so large, we may note that : 


(1) Jf, and M, are again very near the expected values, showing that there is 
a marked consistency in the results—the variation in those particular samples in 
which £, is large (or the cases in which the p of the first sample of n was very 
small) is no more and no less than throughout the whole series of 300. 


(2) As it should be, the value of M, is here greater than in the case of the 


whole 300 samples, but again it is considerably below the expected value. This 


to its mean. 





, mp 


m pq 
a o,=4/ er 
n ne 


divergence may perhaps be partly due to the fact that the distribution of M, is 
probably skew, so that the observed M, may lie nearer its modal value than it does 


Now let us consider what results would have been obtained had we made use 
of the approximate prediction rule, taking, as on p. 419, 


and supposing all the curves of distribution of 7 to be Normal. If this method of 


prediction was adequate we should be justified in comparing the moments of the 


observed distribution of 


observation No. 








257, in which the 
from the expected value. 


+ Here Mean §,;= 
standard deviation is unity. 


’ 
, r— 9? 
Ose ae 


oy 


The last column of the Table again 


with the mean values ,M, and ,=, given in Table XVI+. It has been necessary to 
omit six cases in which p was zero but not r, so that 
theory does not allow for these. 


z’ was infinite; the crude 


omits 


value of r had here a deviation of nearly 60, 
A comparison of Table XIII with XVI shows how 
* These two values lying beyond the range of the existing tables were calculated from the finite 


difference equations between successive f’s of Pearson’s curves. 
-0, Mean 8,=3, Mean £,=15, Mean Bg=105, or the Normal curve values when 
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TABLE XVI. 
Moments of Distribution of z on modified hypothesis. 


Mean values in repeated 
series of samples 


Observed values of iM, 











q | og o2¢ For the 24 samples | For 293 samples 
| | | 
‘ | 
1 | 0000 0583 + 1701 + "1904 
2 1-0000 0825 | 28252 2°7201 
(si ea | 2259 | + *4658 1°1329 
4 | 3°0000 ‘5714 | 33-4501 29-7044 | 


hopelessly inadequate are the methods of prediction employed in the latter case ; 
»M,, »M. and ,M, have increasingly impossible values, and the value of ,M, (+°1701) 
is far less satisfactory than M, (—-0173), which is a point of some importance. It 
shows that not only are better predictions obtained by using the standard devia- 


m(p +1) 
nv 


tion o (from Equation (iv)) rather than o,, but that  - is definitely a better 


prediction value to take for 7 in the second sample than is =. 

It has been shown that for the samples with high values of 8, and ~,, M, 
remains close to the expected value of 1:0. I have applied two other tests of 
consistency. The accuracy of the expression 





= (p+1)(q+1)m(n+m +2) i ; 
c=, (n + 2) (n+ 3) a cae (from (iv)) 





as a measure of dispersion has been checked by calculating the value of M, for the 
whole 300 samples. It is possible that heterogeneity might lie concealed under 
this general measure ; for example, an unnecessarily large allowance for variation 
made when a is large, which is usually the case when m > n, might be balanced in 
the long run by a too small allowance in other cases, or vice versa. 


The coefficient of correlation was therefore found between o and z; it was 
— 0268 + ‘0389, while ,., = ‘2352 against a mean value to be expected in cases of 
no correlation of jj,.¢ = °2082 + ‘0389. There is therefore no significant relation, 
and o appears to be a satisfactory measure of variation whether the limits of 
prediction are great or small. Finally I have considered the 129 cases in which 
the size of the second sample was greater than the first; among these there was 
great variety from the case where m was only a few units greater than n, to the 
extreme case where m= 10n (e.g. n = 20, m= 200). It is found that 


in 14 cases the value of r lies outside the 90°/, limits, 


in 3 cases m ‘i m 99 °/. limits. 
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These frequencies are a little greater than the expected numbers of 12°9 and 1:3, 
but on further analysis it is found that 

M, = + 0457 compared with M,= -0000 + O594*, 

M,=+°9934 __,, »  M, = 1-:0000 + -0927*, 
which is very satisfactory. 


We may therefore conclude that as far as the results of this series of experi- 
mental samplings are concerned, the use of the measures of expected frequency (»,’) 
and variation (¢) in second samples obtained from the simple Bayes’ hypothesis 
in which F'(P) is taken as constant, is entirely justified by the frequencies in the 
observed second samples. It is a powerful feature of Bayes’ Theorem that it allows 
for the first sample being much smaller than the second, and the experimental 
results suggest that this allowance is neither more.nor less than adequate. The 
accuracy of the expressions for the higher moments cannot be certainly checked 
owing to the very high probable errors of constants determined in this special 
manner from only 300 samples, but the results of the sampling of Series I certainly 
suggest that the theoretical measures of skewness are likely in the long run to be 
of real use in prediction. 

8. In the preceding section it has been assumed that the distribution F(P) 
was rectangular, and the success obtained in the prediction of limits in second 
samples has shown that this assumption was justified. It is however interesting 
to consider the observed distribution of the ratios (p+7r)/(n+m) a little more 
closely; the form of distribution was shown in Fig. 3. In the first place, this has 
been fitted with a Type I curve of the form of Equation (v), choosing @ so as to 
make the second moments of curve and observation agree. The curve obtained is 

Fe ee  ncsececeainces cease (xxv), 
and is shown in the figure. The fit is clearly very rough, largely because of the 
irregular jumps in frequency for values of (p+r)/(n+m) centred at ‘15 and 85. 
As the distribution is based on only 300 samples and this irregular frequency 
contains 39 cases against 32 in each of the neighbouring groups, no valid con- 
clusions can be drawn from this result alone. 

There was however a similar maximum in the distribution of p+7r found in 
Series I (see Fig. 2), and it is possible that the feature has some real significance. 
For example, it would be possible to advance the hypothesis that if a free and 
unprejudiced selection of random samples could be taken from among the popu- 
lations of experience, the distribution of P would be found to be a U-curve following 
approximately the distribution of Fig. 3 between P =*15 and ‘85, and then rising 
steeply to P=0 and 1. In this event the observed shortage of cases with 
(p+r)/(n+m) below ‘15 and above 85 would be due to my own personal equation 
as an observer, to some unconscious prejudice against selecting freely for the ex- 
periment, propositions with which a very low or high value of P is associated. On 
this hypothesis we might fit a portion of a Type I curve having limits at P=0 
and P =1, to the observed frequencies in the range ‘15 to ‘85, and claim that this 

* Taking the probable errors as + “67449 times >; and >). 
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was the true F(P) distribution*. Such a curve has been fitted by the method of 
least squares, using the logarithms of ordinates. It has the equation 
ae FI (bs NO a cevvvisicsvcastavvcseved (xxvi), 

and is shown in Fig. 3. It is certainly possible that in collecting material for 
this special purpose there would be a tendency not to fix on examples in which P 
is very low or very high, simply because they would not occur to one; while the 
statistician in practice dealing with material that came before him more or less by 
chance might find a more definite U-shape in his distribution F(P). Whether 
this is so, could only be shown by the test of wider experience, but I doubt that 
the difference could be so great as between my observations and the curve of 
Equation (xxvi). The area under this curve is to the area of the histogram in 
Fig. 3 as 1025 to 300, so that if the former were the typical curve the statistician 
in practice must be supposed to come across 7 additional populations with very 
high or low values of P for every 3 that I have met. 

But let us suppose that this were so, and that a statistician were to make use 
of Bayes’ Theorem with F'(P) constant for purposes of prediction when dealing 
with a large series of populations in which F’(P) was really represented by a curve 
similar to (xxvi). Would his predictions be entirely invalid? It has been shown 
that the introduction of a U-shaped F(P) curve will only seriously modify the 
moments of the expected distribution of 7 in second samples when p is but a few 
units. Only such cases need be considered ; the values of constants, limits, etc. for 
three of these taken from the 300 double samples are given in Table XVII: 

No. 92. As has been shown before, a negative a gives a lower mean and standard 
deviation and higher values of 8, and £., or greater skewness+. In this particular 
case we note a very considerable difference in the value of these constants, according 
to whether a= 0 or — ‘66, so that the curves of distribution of 7 in second samples 
do not correspond closely. But the statistician will not be dealing with repeated 
samples in which n = 60, m= 30, p=1, and so for most practical purposes will not 
wish to make use of the whole form of the curve. He will be content to obtain 
some limit of range within which he may expect r to fall. The later columns of 
Table XVII give the deviations of the limits of the central 90 / and 99°/ of area 
taken from the Tables of the Appendix, and also these limits converted into terms 
of frequency of “successes” in the second sample, by multiplying by o and adding 
to the Mean. In the first place, consider the lower limits; their values are here 
of no great importance, for they would not be made use of. The modal value of r 
is Of, and the Type I curve cannot be expected to correspond exactly with the 
abrupt start of the hypergeometrical histogram. In such cases therefore the lower 
limit of r would naturally be taken as zero. For the upper limits of 7, we have 

(1) 90%: 33 fora=0, 2°7 for a=—66, 
(2) 99°: 55 fora=0, 48 for a=—-66. 

* Or rather the curve when reduced so that the total area below it is unity. 

+ The points representing the Bayes’ hypergeometrical series lie so near to the Type III line, that 
the skewness is closely proportional to VB; . 

+ The first two terms of the series representing C,,,,, are proportional to 1 and m (p+ 1)/(q+m). 
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These are certainly different, but those for a=—-°66 are less than for a=0; 
that is to say, if we use the Bayes’ hypothesis assuming that a=0 when we are 
really sampling from populations among which P follows a distribution in which 
a=— ‘66, the upper limits to r that we predict will be higher than their real 
values, and we shall therefore be erring on the side of safety. 

No. 135. In this case the curves are not quite as skew as before ; but the first 
two terms in the hypergeometrical series are 1 and 40/39, so that the Type I curve 
is still not adequate at this tail, and the lower limit of r is again 0. At the other 
tail, the upper limit is in both cases about a unit higher when a=0 than when 
a=—'66; that is to say, we should again be on the cautious side in using the 
simple Bayes’ Theorem for prediction. 

No. 157. Here p is larger (=7) and, except for the Means, the two sets of 
moments have almost reached the point where the difference is quite insignificant. 
The histogram is no longer J-shaped*, the Type I curve will give a better fit at 
the commencing tail and the lower limit for the 90 °/ range is not at zero. Here 
the value for a=0 (2°4) lies within that for a= —-66 (2°1), but the difference is 
not serious, and in practice, only integral values of r being possible, we should 
take 2 as the lower limit in both cases. Similarly for the 99 range, we should 
take 0 as the lower limit. For the upper limits, we find again that the curve for 
a=0 gives us a unit more on the safe side. 

The position is therefore as follows. If we form our limits of prediction on the 
assumption that a=0, then even in the extreme case when a= —‘66, there will 
only be a sensible error when p is small. When this is so the upper limit of 
prediction for r (whether for 90°/ or 99 °/ range) will be somewhat higher than 
it need be, and while the lower limit will also be a little higher+, with small 
values of p this will generally be taken as zero. The position is represented 
diagrammatically on an exaggerated scale in Fig. 4; the curve for a=0 has a higher 
mean, a larger standard deviation, but less skewness than that for a negative a. 

The position would not be quite so satisfactory if a were a positive fraction 
and the curve F(P) as (b) of Fig. 1. Here the changes in the moments would be 
of opposite sign, since to the first order they depend on res that is to say, in 
assuming that a=0, we should base our predictions on a curve having too low a 
mean, too small a standard deviation and too great skewness. The predicted 
upper limit for r (for the 90 and 99°/ areas) would now be a little too low. 
But at present there is no evidence that this experience is likely to be met with, ' 
for it seems probable that the F'(P) curve, if not horizontal, will be concave rather 
than convex. 


The particular form of distribution of F (P)—a Type I curve—has been taken 
for purposes of illustration; it should not be supposed for a moment that the 
distributions of P among the populations of other observers’ experience will be 


* The first three terms are approximately as 1, 4, 8. 
+ But not as much higher as for the upper limit. 
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found a posteriori to follow such a curve at all closely. My own series of ob- 
servations have not in fact done so. But the discussion of this illustration in 
some detail serves two purposes; it shows how a possible form of divergence 























t-0 Range (A- ve) T-m 
be —----- Range (d=0)- -- --—-- — 
Fig. 4. Diagram representing the limits of percentage frequencies in curves for 
predicting the values of r in second samples. 
from linearity in the distribution of P will affect the moments of the distributions 
of r, and from the comparatively unimportant nature of these modifications gives 
us confidence to apply Bayes’ Theorem to predict the limits of r in future 
sampling. 


It is now possible to consider more clearly the problem that so much exercised 
the 19th century writers, that is to say, the case where p (or q) is 0. In the 
particular case when #’(P) was taken as a Type I U-curve it was necessary to 
make the limitation that there should be no populations with P =0 or 1 exactly, 
in order to reach the integrals of (i) from the summations of (ia). But for the 
case F'(P)=constant this was not necessary, and in general there is no reason 
why the populations sampled should not contain a finite proportion in which 
P=0or1. In both the present series of experiments there were very few cases 
in which this could be considered to be so*, for one naturally avoided absolutely 
impossible or absolutely certain propositions, realising that but little mental effort 
in selecting propositions would be needed to swamp the whole of the experiment 
with cases of absolute certainty or the reverse. On the other hand in practical 
sampling cases will occur when P =0 or 1 although the observer can neither tell 
that this is so a priori or a posterior. For example, in medical work it may be 
desired to test the result of a certain treatment; the observer may have good 
grounds for believing that the conditions in two samples are the same, but yet be 
quite uncertain a priori whether the treatment is going to have any effect or not. 
On finding no cures in the first sample he will still be uncertain, but Bayes’ 


* There was one case I know when this was so, in which I searched in samples from the list of 
members of the Hackney Stud Society for the number of lady members, to find afterwards that they 
were included in a separate list. But even here there was a chance of a secretarial error including one 
among the men. 
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Theorem will enable him to predict limits of range within which the number of 
cases is likely to fall in the second sample. All that is essential in the long run 
for accuracy of prediction through a series of varied tests is that among the 
population sampled in these tests the distribution of P will not differ more from 
linearity than in the cases that have been considered. The fact that in my 
experiments there were very few cases with P=0 or 1 exactly is of little con- 
sequence in practice, for there is no sensible difference in prediction between the 
cases where P= 0 and P=‘001. In the first series of experiments, it was found 
that the observed distribution of r in second samples where p had been 0 in the 
first sample, gave a very good fit to theory*. Further, it must be remembered 
that if P=0, both p and r will be 0, so that r will certainly fall within the 
predicted limits. In fact if some of the early writers had found time to divert 
themselves by examining the values of 7 in samples from a whole series of popu- 
lations in which an event had never been known to occur before, it is true that 
the Bayes’ Theorem distribution of 7 would not have been borne out, but had 
they taken reasonable caution to ensure the stability of their populations, I think 
they would have found in the long run that in at least 99°/, of cases r would 
lie between the 99°/, limits of frequency obtained as suggested in this paper. 
As for the cases in which n=1 and m=1, these hardly belong to practical 
statistics. The Type I curves for distribution of r would of course be inappro- 
priate+, yet in so far as the sampling was carried out from populations among 
which F'(P) varied within the limits we have considered, I see no reason to doubt 
that in the long run the predictions of Bayes’ Theorem would be carried out. 


9. Finally let us consider as an illustration the 20 cases selecte’ randomly 
from the 300 double samples of Series II, which are described in © vie XVIII. 
Take the first as an example: in the Euston Road, on a certai: day, out of 
200 men passed, 26 have been observed to carry umbrellas; within what limits 
shall we expect to find the number falling in the next 150 that are observed / 
From n,m and p we calculate the moments of the distribution of » from Equations 
(iv); the linear relation between 8, and 8, of (xix) then forms a useful check on 
the computation. Next we enter the Tables of the Appendix with §, and 8, and 
find the deviations from the mean, 

(a) for the central 90 °/, of area (Tables (1) and (2)) — 1°55, + 1°74, 
(b) ‘3 « 3S" » (Tables (3) and (4)) — 2°24, + 2°87. 

Then the corresponding limits of range are 

(a) v{ —155e0=11'5t, vw, + 1°740 = 29°6, 
(b) », —2240= 78, v, +2870 = 35'8. 

That is to say in terms of probability the chances are at least 9 to 1 that r 
will lie between 11 and 30, and at least 99 to 1 that it will lie between 7 and 36. 
Actually the observed value was 13. 





* See Tables IV and VI. 
+ For n and m between 1 and 15 the terms of the hypergeometrical series can be readily computed 
or use may be made of Table XLVIII in V'ables for Statisticians and Biometricians. 







































Econ S. Prarson 431 1 


If a more rapid but approximate method of prediction is required, we can | 
calculate only »,’, and f,, and assuming the curve of distribution of 7 to be 
adequately represented by Type III*, take as the deviations from the Appendix 
Tables the bottom figures in the appropriate 8, column. Or more roughly still 
we may compute only »,’ and o, and assuming the curve of r to be Normal, take 
the limits of range at 

v; + 164490 = 11:0* and 29:1; and », + 2°57580 = 5°9 and 342. 


It was seen that this last method gave satisfactory results for the 300 double 
samples+; but we cannot be certain that it would invariably do so and the wiser 
course is to make use of the full rule which at any rate on theoretical grounds is 
definitely more accurate in the long run. 

The full rule has therefore been employed in calculating the 90°/, limits 
given in Table XVIII. It will be seen that in no case does r fall outside these, 
although we should have been prepared to find this occur in 2 cases (or 10 °/,). 
There is no need to discuss the results in detail, but it should be specially noted 
how, in those cases where the first sample is smaller than the second, Bayes’ 
Theorem provides a very large standard deviation, so allowing for the greater 
uncertainty of prediction. In the two cases where the lower limit is negative and 
given in brackets, it would be naturally taken as at 0. As for the complete 300 
cases, a measure of the agreement of observation with theory may be obtained 
from the moments of the distribution of z= (7 —»,')/o. The values of z are given 
in the last column of the Table. Using the previous notation t, 


M, = + :0289, M,= °5652, 
while M,= 0000 +°1508, M,=1-0000 + -2355. 


Thus the difference between the observed and “expected” values of 7 is in the 
long run neither too great nor too small, while the actual variation about the 
expected values is exceptionally low, although I think no significance can be 
attached to this. The method of random selection employed in nearly all the 
examples of Table XVIII has been by alphabetical arrangement. This may 
appear a form of selection not so very different from drawing balls out of a bag, 
and it might perhaps be thought that so much stability will not be found among 
the populations which we are sampling in practical work. To test this point 
I have analysed the double samples Nos. 1—52, which were all collected in the 
street, or in trains, buses, etc., that is to say without any form of mechanical 
selection§. In these 52 cases, r lies 3 times outside the limits of the central | 

| 








| 
| 
| 
| 
| 
| 
| 


* For my particular series of observations the values of 8, given by the regression line (xxi) would 
be more accurate in the long run. 
T p. 419. a = | 
t pp. 420—421, taking the probable errors of M, and My, as +: -67449 S, and + ‘67449 >,, and obtain- f 
ing =, by using for Mean 8, the mean value for all 300 cases, i.e. 3°4388. | 
§ Here are some examples of characters observed: No. 1, Bus drivers with moustaches; No. 5, Men | 
wearing glasses in the street; No. 18, Morris cars among all private cars; No. 36, Men in the Tube 
reading newspapers; No. 43, Men without collars met in passing along a certain street; No. 50, Copper 
coins with King Edward’s head on. 
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90 °/, area, when the average in repeated series of 52 would be 5:2 times; and 
never outside the limits of the 99 °/, area. For the z distribution 


M, = + 0024, M,= ‘9754, 


M,= 0000 +-0935, M,=1-:0000 +°1461, 


or M, and M, are extraordinarily close to the “expected” values, and there is no 
suggestion of instability. 


The distribution of values of (p+ 1r)/(n+m) among the 20 double samples of 
Table XVIII is as follows: between 0°0 and 0:1, 6; between 071 and 0:2, 4; 
between 0°2 and 0°3, 4; between 0°3 and 0°4,3; between 0°4 and 0°5, 3. This 
suggests a slightly U-shaped distribution of F (P). 


10, What then is the final position? Looking at the problem from the point 
of view of frequency rather than probability, I have tried to show the position 
which I believe Bayes’ Theorem should take in the field of -practical sampling. 
I do not start from the point of view of an a priori distribution of probabilities, 
nor do I suppose that the statistician in sampling a population is entirely ignorant 
of its contents. The exact form of the distribution, F(P), seems to be something 
which each statistician can only determine for himself @ posteriori by an examina- 
tion of his own statistical experience, and Bayes’ Theorem can only be accepted 
as providing a valuable working rule for prediction, on the assumption that among 
the problems with which most statisticians are confronted there is in fact a distri- 
bution of values of P between 0 and 1 whose difference from linearity is of the 
same order as that observed in the experiments that have been discussed in this 
paper. This assumption does not appear to be unreasonable, but no final con- 
clusions can be reached until evidence has been collected from a wider source*. 
Again I would not suggest that the theorem should be applied to problems where 
we are in complete ignorance of everything but the values of n, p and m. Sufficient 
must be known of the factors influencing <':2 character of the population to provide 
good grounds for supposing that the conditions have not changed between the 
drawing of the first and second sample. Certain risks must be taken or no pre- 
dictions could ever be made; but the acceptance of these risks, which fall outside 
the scope of reasoning of the mathematical theorist, is justified by past experience 
in the eyes of the practical worker. 

Again, the observer may sometimes have relevant information enabling him 
to alter or narrow his prediction limits for r. He may know for some reason that 
the p in his first sample is particularly far from the modal value which would be 
observed in repeated samples. Just as some Superman who followed the shakings 
of his bag of balls might tell by the aid of dynamics, of elasticity and of geometry 
the exact constitution of a particular sample of balls, so in our problems we may 
sometimes have information which tells us which, out of the universe of possible 
samples, our particular sample may be. But this does not detract from the value 
of Bayes’ Theorem as providing an accurate mass result when applied to a great 

* One feels inclined to urge statisticians to apply Bayes’ Theorem wherever possible in their work, 
and to record and compare their results, 
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range of problems, and how often too is that supposed relevant information un- 
reliable or in a shape incapable of being put into numerical form! Take for 
instance the 20 examples of Table XVIII; would it have been possible to fix 
by any other rule, limits of no wider range within which nine-tenths of the values 
of r would have been found to fall? It is true that these particular 20 questions 
have a somewhat academic ring which can hardly be avoided in a large scale ad 
hoc experiment, yet surely in our everyday work we do often have to sample 
populations in which there is as much reason to expect stabiliiv as in the suc- 
cessive pages of a book or at the two ends of a street? On this point the reader 
must form his own final judgment, but if he allows that the experiments are 
typical and that the conclusions that have been drawn are correct, then he must 
agree that the results have a wider bearing on the whole question of probable 
errors as well as on the particular problem of Bayes’ Theorem. For they provide 
further evidence to confirm our belief in the stability of statistical ratios in the 
world of experience upon which so much practical work depends. 


In the special problem, as throughout the whole field of statistics, it is perhaps 
necessary to allow a slightly greater latitude in variation than the theory of 
probable errors based on the “bag of balls” method of sampling permits. But this 
allowance is not great; at worst, as in the first series of experiments, we find 
a general Goodness of Fit result of P(x?) ='05 where we should have liked to 
find ‘50; while at best we have the values of M, and M, in the 52 cases of 
street sampling—‘0024 and ‘9754 against theoretical values of ‘0000 + -0935 and 
1:0000 + ‘1461. 


11. The inverse of Bayes’ Theorem. ‘There is a problem which is often met with 
where the position is the inverse of that which we have considered. After finding 
a frequency of occurrence, p, of a certain character in a sample of n, a frequency, r, 
of the same character is observed in a second sample of m; what measure can be 
given to the probability that the two samples come from the same population ? 
Suppose that the test is applied on NV occasions to samples drawn from populations 
among which the distribution of P is such that the simple Bayes’ Theorem can 
be justifiably applied. From the observed values of n, m and p we can compute 
in each case a limiting range outside which only k°/ of the values of r should lie 
on the Bayes’ hypothesis if both samples were drawn at random from the same 
population. Suppose out of the NV cases, 


(1) in A, the two samples have been drawn from the same population, and that 
in « of these 7 lies outside the k& Y/ limits ; 


(2) in B, the two samples have been drawn from different populations, and 
that in b of these the value of r lies outside the limits. B will not equal b, for 
some good fits will occur when sampling from different populations. 

Now if the test is applied indiscriminately to all pairs of samples observed, 
then when A is large 100a/A—/k. If k be so small that we decide to take the risk 
of rejecting all those double samples in which the value of 7 lies outside the k‘/ 
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limits, as almost certainly coming from different populations, then we shall be 
wrong in this assumption in a proportion of a/(A+B) or a/N of the total cases, 
which is less than k'/. This is a definite criterion giving an outside limit of error. 
On the other hand in accepting double samples where r lies within the limits, as 
probably all having both samples from a single population, we shall be wrong in 
a proportion (B—b)/N of the total cases examined. This may be a very small 
quantity, but we have no means of measuring the relations of B, b and N to one 
another, for they depend to some extent on what is meant by the term “same 
population,” and will vary according to the proportions of A and B among our tests. 

It is important also to be clear regarding the limit of error risked by rejecting 
cases where the observed value of r is outside the limits of the central k‘Y fre- 
quency of the r-distribution curve. In terms of observed frequency the rule will 
only be found accurate in the long run, if the test is applied to all double samples 
whether the correspondence of p and r is bad or good. For suppose the test were 
only to be applied where a bad fit was evident from inspection, then the proportion 
a/(A+ B) would really be nearer a/(a +6), which is a quantity to which we can 
give no measure, and is likely to be far larger than k°/. On applying the test, 
the observer may find it extremely unlikely that a certain frequency r should be 
found in a sample from the same population that has previously given him a 
frequency p; yet supposing he has been studying double samples from very many 
varied populations but has not applied the test until this occasion when he finds 
an obviously discordant pair, then this improbability vanishes. The position is 
perhaps obvious, but I think it is sometimes overlooked. If, to take a different 
example, in our statistical experience, we only apply the y* Test for Goodness 
of Fit to cases where from inspection the fit looks unsatisfactory, then clearly we 
must be very careful what deductions we draw. For we might obtain in nearly 
100 / of tests, fits with P(x) less than 0°1, where yet in every case the sample 
had been drawn from the supposed population. 

It is of course the old difficulty; once or twice in a thousand times an event 
will occur against which the odds are 999 to 1, but when it does occur we are 
inclined to think that something is wrong, focusing our attention on the single 
event and forgetting the 999 times when we have observed that it did not happen. 
And so in practical statistics it seems to me that measures of probability can 
only be usefully interpreted by ‘a reference to statistical frequency, whether re- 
corded in the past or to be expected in the future. And it is from this point of 
view that I have tried to examine Bayes’ Theorem throughout the paper. 


Appendix, containing Tables of Deviates of Type I Curves. 


The general Type I curve 


Y= Yo (1 + =) (1 én ) Lpkagliceeueanal (i), 


can be transformed by a change of origin and scale to the form 


et tt | ee a OEE err ere aR “i 
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If we write 
x rl 
B, (s, t)= | (1-2) de / [8h = a dn anes (iii), 
0 0 


then B, (s, t) is the proportion of the total frequency under the curve lying between 
the tail at which the origin lies and the ordinate at a distance « therefrom, or the 
Incomplete Beta-Function. 


The four Tables appended give for curves with different values of B, and B, 
the deviations from the mean to the ordinates for which (1) B,(s, t)=°05, (2) °95, 
(3) 005, (4) ‘995, the unit of measurement being the standard deviation of the 
curve. Thus when we know the mean, the standard deviation and the moment 
coefficients 8, and B, of a particular Type I curve falling within the range of the 
tables, Tables (1) and (2) give the limits of the “central” 90°/, of frequency, and 
Tables (3) and (4) of the “central” 99°/,. The curves are supposed drawn with 
positive skewness so that the deviations of (1) and (3) are really negative, those 
of (2) and (4) positive, the smaller deviations lying to the steep side of the curve. 


The field covered by the tables is shown in the #,, 8, diagram of Fig. 5; it 
was chosen to cover the range of Type I curves likely to be met with in the 
application of Bayes’ Theorem, but pending the construction of tables of the 
Incomplete Beta-Function, the tables should be of use in other problems. 

For example, for the distribution represented by the expansion of the binomial 
(p + q)”", 


1 — 490 ~. , L— 6p 
8, = Pd B,=3+ Pq 
npq npq 
so that for a given value of 7, eliminating p and q, 
2 . 
B2—- B,-3 +> =0 Sag easapheceeatogsWeatievacens (iv). 


That is to say, for a given 2 the points in the {,, 8, field representing the 
binomial for varying p’s will lie on a fixed straight line parallel to the Poisson 


2 ; . 
line 8. — 8,-—3=0 and cutting the A, axis at the point 3 — = The Poisson line, 


which is shown in Fig. 5, divides the field of positive and negative binomials. 
If x be not too small, we may take the Type I curve with the same values of £, 
and 8, as providing a good representation of the binomial, and the four Tables 
will therefore give the limits within which the central 90 °/, and 99 °/, of frequency 
lie for a large range of positive and negative binomials. 


Again, there are many problems in which the distribution in samples, of a 
mean or other frequency constant, is represented, not by the Normal Curve, but by 
a skew curve whose moments are known theoretically ; within the range covered, 
the tables will here provide a criterion corresponding to the “probable error,” 
enabling a judgment to be formed on the probability of the occurrence of the 
observed constants in a random sample from the assumed population. It should 
be noted that the limits of the central 90°/, of frequency correspond in the case 
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of the Normal Curve to a deviation of + 2°44, and of the central 99 °/, to + 3°82 
times the probable error. 

The tables were computed as follows: 


For values of 8, between 0°0 and 1:4 a frame of points was taken along the 
parallel lines 


PS REND osc cenvexasnrvencesiacussians (v), 

Be I RD ceinsrienesccccveseriwcivesete (vi), 

UD issisecckinsce is etscreten (vii). 
For values, of 8, between 1:4 and 3:0 the lines used were 

Bg ly OO Dini cons cena cecicccensvceesind (vill), 

BF HBG RD oavivcccscsicecicrcsccvess (v) bis, 

Sg ON OED cin cncnnecenstacsanonel (vii) bis. 


The computation of these frame values along the lines (v), (vi) and (vill) was 
carried out either by quadrature, or in certain cases where the tail of the curve 
rose very steeply, by the expansion of (11) in powers of #. In applying quadrature, 
the areas from the tail to four equidistant ordinates in the neighbourhood of the 
desired limit (bounding the 05 or ‘005 tail section) were calculated, and the exact 
position of the limiting ordinate found by backward interpolation. Equation (vii) 
is that of the Type III line, and in the great majority of cases the frame values 
along this line could be found from the Table of the Incomplete Gamma-Function. 

About 35 curves were used in all. The frame values of the deviations were 
calculated to three places of decimals, but as the differences were in some cases stil] 
large, it was not possible, without greatly extending the work by adding further 
lines to the frame, to calculate the interpolated values to this degree of accuracy 
throughout. The tables therefore go only to two decimal places, and in a certain 
number of cases a unit error may occur in the last place, but this is not of serious 
consequence for the purpose for which they are intended. 

The very slow variation, as 8, and 8, are changed, in the sums of the two 
deviations, or the range of the central 90°/, and 99°/, frequency areas, provided 
a useful check on the computation. For example, along the Type III line this 
range for the 90°/, area decreases only from 3:29¢ to 3:02¢ as 8, changes from 
00 to 3:0, while for the 99 °/, area the range increases from 5°15 to 5°23*. In the 
neighbourhood of this line, (vii), the skewness increases approximately as VB,, and 
the tables show clearly how the deviations alter rapidly for low values of 8,, but 

far more slowly later on. Beyond the range of the tables the deviations alter 
very slowly, and it was found that an extension of Tables (1) and (2) would give 
the deviations for the 90 °/, area 


at B, = 40, B,=78 as — ‘81 and + 2°11, 
at 8, =40, 8,=9°0 (the Exponential Point) as —-95~ and + 2°00. 


* It first falls very slightly to a minimum value at about 8, =0°4. 
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MUTUALLY CONSISTENT MULTIPLE REGRESSION 
SURFACES. 


By Dr BURTON H. CAMP. 


(1) Introductory. It is my purpose to study some of the properties of multiple 
correlation solids in which there are three variates only, x, y, and z, with special 
reference to the most probable forms of their regression surfaces and total regres- 
sion curves. It will be desirable, incidentally, to develop certain properties of 
correlation surfaces in which the frequency is a function of two variates only. 


As soon as one departs slightly from the normal case, in which all the regres- 
sion surfaces are plane, and all the total regression curves are lines, it is quite 
customary to assume that the surfaces are polynomials of the second or higher 
degree ; e.g., the regression of z on wy may be written: 


B= Jot Jil +. G2Y + GJsLY + Gs¥> + sy? ..-eececececcceeeeees (1). 

The values of the g’s in a numerical case are then usually determined by least 
squares. It is frequently assumed that the total regression curves also are poly- 
nomials ; e.g., the total regression of z on # may be assumed to be a polynomial of 
the second degree or higher in x. It takes but brief consideration, however, to 
assure one that if mathematical rigour is to be preserved such assumptions must 
not be made in random fashion. There are inter-relations among these regression 
surfaces and curves: if some are prescribed, others are determined. Further, even 
when the remaining regressions are not thus rigidly determined mathematically, 
they often are so determined save for a remote possibility which one is quite 
willing to assume non-existent. For instance, suppose the three-way frequency 
solid were required to become two-dimensional only, i.e., a mathematical surface. 
This would be a possibility that one would be unwilling to admit, for one would 
usually be sure that there was real variation with respect to all three of the 
variates, not merely with respect to two. 


It becomes important, :‘crefore, to separate out from among the various con- 
ceivable combinations th: uld be made a few simple, self-consistent groups. 
One such group might contain the above equation (1) as the regression of z on 
x,y. What simple assumptions could be made with regard to the other regression 
surfaces, and with regard to the total regression curves, which would not conflict 
with that equation or with each other? A few such cases will be described in 
Section (4). It is, however, rather the method used than the actual cases described 
which will be found useful. As a further example of the method, a detailed study 
is made of a frequency solid used by Dr L. Isserlis for which the data were supplied 
by Miss Elderton. 


29-2 
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A study of the various cases leads to the following conclusion. If the regression 
surfaces are to be simple polynomials, but not planes, then a common form for the 
total regression to take will be : 

_ polynomial in «# 


J polynomial in « 
and very often it will be much more complicated than this. Now it will be 
recognized that in almost every two-way correlation the frequencies are in reality 
the totals of an m-way correlation regression solid. The variation with regard to 
the other (m — 2) variables may not have been studied, and so the data may not 
be available to describe that solid, but it will be recognized that this variation 
and that solid do exist. So then, not to progress beyond the three-way solid for 
the moment, we may say that in general two-way regression is in reality a total 
regression for a three-way solid. Our study seems to imply, then, that the form (2) 
should occur frequently in the study of two-way correlation. This point is con- 
sidered more fully in Section (6). 

In Section (6) brief mention is made of the implications involved when we 
regard two-way correlation as one of the total correlations in a solid of more than 
three dimensions. Finally, further possible applications of the results are suggested, 
especially in the field of business forecasting. 

This paper was begun while the author was at the Galton Laboratory. It is a 
pleasure to record his gratitude to Professor Karl Pearson for suggestions and 
kindly criticism. 

(2) Two Variates. It is necessary first to formulate certain properties of 
correlation surfaces in which the frequency n(a,y) is a function of two variates 
only, # and y. Most, if not all, of these have been observed before. They are 
easily derivable from the definitions. In making a correlation table, one divides 
the ay-plane into cells of area* Aw Ay. Then n(a, y) is the frequency in the cell 
whose coordinates are («,y). For our purposes it seems to be a trifle more 
convenient to use the language of integrals rather than of sums, and so, to be 
consistent, one may think of a correlation table as a plane slab of non-uniform 
density, and of n(#,y) as the mass (frequency) “at the point” (#,y). Let n 
represent the total mass (total frequency). Then 


n (v)= fn (a,y)dy, n(y)= | n(a, y) da | 





: al ee ee [ ak oe (1). 
If the origin be chosen at the “centre of gravity ” of the slab, so that 
| n(x) eda = [ 0 (y) ydy =O ....ccecseesereceseeseeees (2), 
then oes. | n(a) ada =» | | n(w,y)atdedy | 
; cake atieeeol (3). 


1 
a | n(y)ydy = i | n (a, y) pdyda | 


* Some authors use the word cell to mean the area yAx. 
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Let the units be so chosen that o,=o,=1. As usual, the product moments and 
simple moments are 


1 
tay *= | | n(a,y) at y’dady, bia =Tqa yo? yt =Uqoyn verrseees (4); 
and in particular 
r=dey=, |] Ce PETE aisccesessstneisdiscn (5). 
Then y (2) = a | n (a, y) ydy is the regression of y on « 
ee ee 
and y' (y) “5 | n(a, y) eda is the regression of # on y 


The partial ath moment about the mean of the column whose abscissa is 2” 
i.e., about the regression curve, is 


1 
bye. n@) | n(x, y)(y—yydy sss... eee (7), 


and the corresponding partial ath moment about the z-axis is 


, 


1 al 
af ee wl" CODE ES societies (8). 


These partial moments are frequently spoken of as moments of arrays*; the 
word partial is used here because it is consistent with the language to be used 
with three-way solids. By equations (7) and (8) 


a(a—1)., 
Mics = Pye + UYMya-t_¢ + 9, Vb yea H+ +", and a 0...(9). 


It follows that, if y(«) be a polynomial of the pth degree in a, yee is in 
general of the apth or higher degree. It will be of the apth degree even when 
the p’s, i.e., the moments about the regression curve, are constants, a condition 
which is sometimes described as “complete homoscedasticity in the «-direction.” 
But it will also be of the apth degree often when these p’s are not constants, 
provided Pyo-k, x be of degree ap/k or less, for each k< a, Of course My = 7 (2). 
We define — and Hedy in manner analogous to equations (7) and (8); 
Ky =Y Cy). As usual, let 

By = p2/o°, Bo = p,/o* oeeecccossecee Sitevrecseved (10). 


* Standard notation is followed very nearly. It is necessary in this paper to indicate the variable or 
variables of which each parameter is a function. This is done sometimes by the letters in the parenthesis 
following the parameter, sometimes by the letters following the dot in the subscript. It is also desirable 
to indicate the direction of the array to which the parameter pertains. This is done by the letter before 
the dot in the subscript, or by the whole subscript when it contains no dot. Thus n(z, y) is a function 
of x and y; n(x) is a function of x, and it might have been written n, ,, since it is the total of a y-array 
and z is the coordinate of this array ; ¢,, is a constant and pertains to the marginal total which extends 
in the x direction. It will be observed that by ‘‘ array” is meant either a column or a row. 
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The following statements, already well known, may now be easily established : 

(a) If the regression, y on @, is linear, y(#)=Tx, qey=THet=1, Taty = Tha» 
Jay) = Tyr, etc. 

(b) If also wy ,., is constant, wy.,=1—7; and if further w,s., is constant, 
fy. o = fy — 7? gs; and if further ws, is constant, ws, .= wa — 6r? (1 — 7?) — rps. 





(c) If both regressions are linear, and if w,2,, and p,2,, are constants, 
Mas = hyp =0; 
i.e., the B,’s of the marginal totals vanish (unless r = 0, 1, or — 1). 
(d) If, in addition to (c), w,,, and p,s,, are constants, they equal zero, and 
Mat = ys =3, Le. the marginal totals are of normal type (8,=8.— 3=0), and the 
8, of each column and each row is zero (unless r=0, 1, or — 1). 





(e) If, in addition to (d), wy... and ps,, are constants (homoscedasticity up to 
and including the £.’s), each column and each row is of normal type (unless r = 0, 
1, or — 1). 

It is desirable that the reader have these results in mind when forming a 
judgment as to the reasonableness of certain hypotheses to be made later. It will 
be noted that, if the marginal totals are not of symmetrical type (8,= 0), one 
must not assume linear regression and constant o’s for parallel arrays*. If the 
marginal totals are not of normal type (8,=8.—3=0), one must not assume 
linear regression and constant second and third partial moments for parallel 
arrays. 

The reasoning which leads to these conclusions does not, however, compel 
similarly severe restrictions, so far as the first four moments are concerned, after 
one has departed from linear regression or from homoscedasticity. To show this I 
will now demonstrate (e), and then consider also the following case. 

Case (f). Let both regressions be parabolic, and let the partial o’s of parallel 
arrays be constants. 

Proof of (e). By hypothesis, ws, = C, w,.,=D. By equations (8) and (9) the 
first of these statements may be written in the equivalent form: 





ir 
n(«) | n(x, y) y' dy =C + Gra? (1 — 1°) + ria, 
which is identically true for all values of 2 We may therefore multiply both sides 
by an (a) da/n and integrate, obtaining ¢,..=C+7p,5. But by (b), 
C= py — 67° (1-9) — rpg =3 (1 -— 1°, 
by use of (d). Therefore 
BPy.2 S3(1—-71) od Fat.4 
; = —-=3. Similarly —*" =3. 
Ky?.a qd vw ry ) F'at.y . 
Case (f ). By hypothesis, 
y=At Bote, Y= A'+ By t+ Cy, 
Py.2 = D, Mgt. y = D. 


* As stated before, by ‘‘ array” is meant either row or column. 
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Applying equations (8) and (9) to each of the above, we have, from the two 
equations on the left: 


1 2 
sie n(x, y) ydy=A + Ba + C2’, 


1 , Cx)? 
n(ay] *@ 9) 9dy = D+ (A + Bo +i0xy 


Multiply each of the identities in (11) by w‘n(«)dx/n and integrate for 
¢=0, 1, 2, ..., but retain only those equations which involve moments of the 
fourth or lower degrees : 


0=A+C 
r=B+Cp, 

day = A + Bure + Cpe 
1=D+A?+ B+ C*pa+2A0 + 2BCp,» 


It is obvious that if the values of ws and p,«, in fact even of r and q,2, also, 
were preassigned arbitrarily, it would still be possible usually to find values for 
A, B, C, and D. Similar reasoning applies to the letters A’, B’, C’, and D’. As far 
as our previous reasoning has led us, therefore, there is nothing to prevent an 
assumption of parabolic regression and constant partial o’s, no matter to what 
types of frequency distribution the marginal totals may belong. 


(3) Three Variates. In constructing a multiple correlation solid of three 
dimensions, one divides ordinary space into cells* of volume ArvAyAz. Let 
n(a,y,2) be the frequency in the cell whose coordinates are (a, y,2); or, if one 
wishes to use the language of integrals rather than of sums, one may think of a 
non-homogeneous solid and of n (#,y,z) as the density at the point (a, y, 2). The 
total frequency of the z column whose coordinates are (#,y) may be written, in 
conformity with the notation of the previous section, n (a, y), or nz, .,, and 


n(x, y)= | n(x, y, 2) dz. 


The total frequency of the slab parallel to the yz plane whose coordinate is « is 


n (@) = Nyy. = Jn (a, y) dy = ffx (x, Y, 2) dy dz ........c0ceees (1). 
The total frequency of the solid is 
n =n (x) da = ffx (x, y) dudy = ff n (a, y, 2) dadydz ............ (2). 


Let the origin be chosen at the “centre of gravity” of this solid so that 
[ff (a, Y, 2) vdedyde =| n (x) edx ={u (y) ydy ={n (z)zdz=0 ...(8). 
Then 
o;= fe n(a, y, 2) dadydz = fe n(a, y) dady = vf n(a)dx ...(4). 


* Some authors define cell so that it has the volume z Az Ay. 
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Let the units be so chosen that 
ig Obici cece cece dice tisevasented (5). 


As usual, the product moments and simple moments are denoted by 


dayz = oll n (x, y, 2) ay? 2° dadydz, 


and in particular = Gye 


is the ath moment of the Dae phe parallel to the # axis; zy =zy2 is the 
total correlation between # and y, i.e., the simple two-way correlation between the 
totals of the z columns. Let g(a, y) be the mean of the z column whose coordinates 
are (a, y), then 


1 . 
g (a, y) = eal" DE GEO, | avcsshieniemncted (6). 
Regarded as an equation in « and y this defines the locus of the mean points 


of all z columns and is therefore the equation of the regression surface, z on «y- 
Similarly, the regression y on «#z is 


1 
h(a,z)= ae, 5] UM BPO one vicsvessasvctisianed (7), 
and of # on yz is 
1 
k(y,2)= an | ERD ODE occ ssissossinnearscesseces (8). 


The section of the g (w, y) surface obtained by fixing w is the partial regression 
of z on y for the fixed a It is the two-way regression curve, z on y, in the slab 
whose coordinate is 2 No special notation is required; it is only necessary to 
regard « as fixed in (6). Equation (6) thus may be thought of as the equation 
either of a regression surface or of a partial regression curve. The mean of that 
marginal total of this slab which extends in the z direction is to be designated by 
a(x), and the equation z=a(z) is the equation of the total regression, z on #. 
There are six total regressions : 


1 . aan 
zone: a(«)=—— val n(“,y)g (wy) dy = 1 (a) il n (a, y, 2) zdydz 


wonz: a(z)= 


a 2) n (2, y) k (y, z) dy Gi [Jue y, 2) adyda 
1 
zony: B(y)= ol n(y, 2) q (a, y)da= nll”! n(x, y,2) zdadz 
‘i 1 
yon z: B’(z)= a! n (a, z)h (a, 2) da = z il (a, y, z)ydady 


1 ; f 
yona: y(«e)= ww" (x, 2) h (a, 2) dz= nah (a, y, 2) ydydz 


(9). 





F 1 ; 1 r 
vwony: (y= sal" (y, 2) k(y, z)dz= =i n(x, y, 2) edadz 


The notation is in agreement with that of Section (2). For example, if we 
integrate n(#,y,2) with respect to z, we get n(a, y), a set of totals which define a 
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two-way correlation table like that of Section (2). By equation (6) of Section (2) 
we have also 


a ye ON ane 
y (y)= > @l" (a, y) ad: aie = n(w,y, 2) edxdz, 


which is in agreement with the last equation of (9) above. Finally it is necessary 
to write equations for the “partial product and simple moments about the coordi- 
nate axes.” These are the ordinary two-way moments about the coordinate axes 
in the correlation tables defined by the slabs of our solid. Thus, in the slab whose 
coordinate is «, the partial product moment of order ab is 


1 
v2.2 = nia {{ Se a ees (10). 
When b= 0 in this, it becomes a partial simple moment : 


Ul , 
q Ta i ode Mee ee (11), 
and when also a@=1, it becomes the first partial simple moment, which, regarded 
as a function of «, is the total regression, y on « : 


ere acces euUaeeeseReTehe (12). 


(4) Relations between Regression Surfaces and Curves. It is clear from the 
formulae of Section (3) that there exist relations between the regression surfaces 
and the total regression curves of a solid of frequency. In developing these rela- 
tions we have the option of two alternative courses. Either we may assume simple 
expressions for our total regression curves, and find the resultant (usually more 
complicated) forms for our regression surfaces; or we may assume simple expres- 
sions for our regression surfaces, and find the resultant (likewise usually more 
complicated) forms for our total regression curves. Which assumption is likely to 
prove the more useful? We may put the question in another form which will 
suggest the answer. As pointed out before, the regression surface, say g (a, y), is, 
when one of the variables, say y, is fixed, a partial regression curve, z on #. Now, 
which is simpler, this partial regression, or the total regression of z on «? Usually 
the former, the partial regression, is simpler, because, when y is fixed, one is 
dealing with a more homogeneous population than when y is allowed to vary. An 
example, to be employed numerically later, will illustrate this remark. Let « 
measure the height, y the weight, and z the age of an individual. The relation 
between height and weight is expected to be simpler for that portion of the 
population all members of which have the same age than for the whole group. 
The reader should not confuse this question with the question of sampling. The 
fact that in any finite group the number of individuals whose age was the same 
would be small, and therefore large discrepancies from the ideal relation apt to 
occur, has nothing to do with the question. We are concerned here with the ideal 
relation itself, not with the sampling variation from it. We may think of the popula- 
tion as infinite, or we may state the example in another way: Select at random a 
million Englishmen; also a million Englishmen whose age is twenty-one. The 
relation between height and weight should be simpler in the second case than in 





| 
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the first—unless it be linear in both. We choose, then, the second of the alter- 
natives, and state, at first without demonstration, the following theorems and 
corollaries. 


Theorem I. Let all the partial regressions be linear functions, such as: 
I (&,Y) = Goo + Jn + Je + Invy. 
Then all the total regressions are of the form: 
parabola * in « 


a(x) =*—__—_.____, 
(«) parabola in « 


Corollary. If also the regression surfaces be linear—so that in particular 


Yu =9, then all the total regressions are linear. 


This is the well-known all linear case. To obtain this sort of total regression, 
it is not necessary to assume a normal solid of frequency or homoscedasticity, 
provided one assumes linear regression surfaces. 


Theorem II. Let all the partial regressions be linear, except that g(a, y) shall 
be parabolic in 2, i.e., 
I (2, Y) = Joo + Jr + JX + Jr LY + Yso%? + JU Yo 
Then in general+ the total regressions will have the forms: 
(Cubic in #) a+ (cubic in «) = 0, 
(Parabola in z) a’ + (parabola in z)= 0, 
(Cubic in y) B® + (cubic in y) 8 + (cubic or higher in y)=0, 
(Parabola in z) 8’ + (parabola in z) = 0, 
(Cubic in «) y+ (cubic in x)= 0, 
(Parabola in ¥)y? + (parabola in y) y’ + (parabola or higher in y) = 0. 
Corollary. If the product terms in the regression surface formulae be omitted, 


and if the partial o, of the slab perpendicular to the y axis be constant 
(u,2, ,=constant in y); then in general the total regressions have the forms: 


a =parabolat, 


a’ =line, 
8 + (line) 8 + (parabola)= 0, 
f’ = line, 


y = parabola, 
y+ (constant) y’ + (line) = 0. 


* I shall use the words, parabola, cubic, etc., in x, to denote a polynomial of the 2nd, 3rd, etc., 
degrees in x; and the words, parabola or higher in x, to denote a polynomial in «x of at least the second 
degree. 

+ L.e., except for specialized cases, one of which is given in the corollary following, where these 
forms are simplified. 

+t Parabola in x. Since a is a function of x only, this is obvious, and so the words ‘‘in «’’ may be 
omitted. Hereafter, in like cases, the independent variable will not be written. 
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Theorem III, Let all the partials be linear except that g(x,y) and k(z, y) 
shall be parabolic in # and z respectively. Then in general the forms for the total 
regressions are: 

(Cubic) a + (cubic) =0, 
(Cubic) a’ + (cubic) =0, 
(Cubic) B* + (cubic) B® + (cubic or higher) 8? + (cubic or higher) 8 
+(cubie or higher) = 0, 
(Cubic) Bp’ + (cubic) = 0, 
(Cubic) y + (cubic) = 0, 
and 9’ has the same form as P. 

Corollary. If the product terms in the regression surfaces be dropped, and if 
the partial o’s of the slabs perpendicular to the y axis be constants (p22. 4, Ms2.y 
equal constants in y); then in general the total regressions have the forms: 

a, a’, 8’, y=parabolae; 8 and y’ have the same form namely: 
B* + (constant) 8* + (line) 8? + (line) 8 + (parabola) = 0. 

Theorem IV. Let all the partial regressions be parabolic, except that h («, z) 
and k(y,z) shall be linear in # and y respectively. Then in general the forms for 
the total regressions are: 

(Quartic) a + (quartic) a + (quartic or higher) a + (quartic or higher) a 
+ (quartic or higher) = 0, 

8 has the same form as a, 

(Quartic) a’ + (quartic) = 0, 

B’ has the same form as a’, 

(Quintic) y‘ + (quintic) y* + (quintic or higher) y* + (quintic or higher) y 
+(quintic or higher) = 0, 

and =’ has the same form as +. 

Theorem V. Let all the partial regressions be parabolic. Then in general the 
forms for the total regressions are: 

(Sextic) a‘ + (sextic) a® + (sextic or higher) a? + (sextic or higher) a 
; + (sextic or higher) = 0, 
and similar equations for the other variables. 

The number of such theorems could be increased at pleasure, but these are 
sufficient to give the reader a general idea of what to expect. They are not 
intended to be sufficient to include many of the cases which will occur in practice. 
The method of derivation will now be illustrated in the proof of one of these 
theorems, and by application to a practical case. Then some comment will be 
made on the general inferences to be drawn, 


Proof of Theorem III. By hypothesis the regression surfaces have the forms: 
I (& Y) = Joo + Jr + JF + Yu LY + Yook® + Jn v y 

h(a, s)zhathazsthyethyet jj — —§— = ._ 4B weeccceseses (1). 

ke (y, 2) = ky + hin + hwy + kunyz + kez? + hey2* 





| 
q 
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We first find the totals by the use of equations (9) of Section (3). The first 
of the above equations holds identically for all values of # and y, and we may 
multiply through by n(«,y)dy/n(«) and integrate with respect to y, obtaining 
the following identity in a: 


But 


and, similarly, 


and, using equation (11) of Section (3), 


We may now solve each pair of equations for the values of the Greek letters. 


By (2) 


A= Jot Ju + Joy X? + (Ju + 9Jn2& + Jo x) {hog + hyx a (hy + hy, «) a}, 
a {1 —(Jout+ Gu® + Jn) (ha + hy x)} 


Le., (cubic in «)a+ (cubic in #) =0, in general, save for the vanishing of certain 
coefficients, as for example as stated in the corollary. From (2) we may also 


derive : 


y= hoo +h 0% + (hy +t hy «) {Joo + Jy + Joy x” ae (dor + gue + gnt?) -. 
¥ (1 = (ha + hn®) (ga + In + JnX*)} 


( ‘ ee : 
a(“)= Joo F Jrv¥ + Joo" + (Jo: + Ju x*) 2 wl” (a, yy dy. 


. 1 
P ae (x, y) ydy = sal! n(a,y, 2) ydydz =y4 (c). 


So, for all values of « identically: 


a = Joo +9 + Joo” + (gat In + Jor x) y) 


9 
Y = ho 4. hy oe (ho, ae hyaw) a PTETErerrTer yi (2), 
a! = hig + ky 2 + kiyy2® + (hear + kz + kyo") ‘ (3) 
B=hyt+ huz + (hy + hyz) a ee eeereevcccese 2), 
B = Joo + InY + (Dro + InY) Oa of (Jon 4+ Ja y) pb’ 52, y (4) 
y= Roo + kywy + (Ko, oo ky y)B ee (Koo + key) Macy eeeeee 3 


= Joo + Jt + Jot” + (Ja + Juv - Ja a) (hoo + h 0%); 





= Ny + hyd + (ha + hi ®) (Goo + Giv® + Joo#*); 


i.e., (cubic in #) y+ (cubic in #)=0. Equations (3) are exactly similar to equations 
(2) and so the expressions for a’ and #’ will be similar to those for a and y. Before 
solving equations (4), we write, by equations (9) of Section (2), 


Then (4) give: 
B= Got Gay + (Gro + Guy) [ko + Koy + Tea + kuny) B+ (ko + ky) (M22. y+ 8°) 
+ (gov + gny) [Hoey + {how + hoy + (kn + kny) B+ (hoe + kiwy) (p22. » + B°)}"]; 
1.€., 8 = line + line {line + (line) 8 + (line) jz,2,, + (line) B*} 
+ (line) {,2, , + (line)? + (jine)’ 8? + (line) w,2, , + (line)? Bt 
+ 2[(line) (line) 8 + (line) (line) p,2,, + (line) (line) 8? 
+ (line) (line) By», , + (line) (line) 8* + (line) (line) y.2, ,8?]}. 
Hence: §*(cubic) + 8° (cubic) + 8* (cubic + cubic pz... ,) 





bg? y = Pat. y + (7'¥, b's, y = hay t+ B. 


+ B (cubic + cubic p.2,,) + (line pgz, , + cubic pw», , + cubic) = 0, 
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This equation verifies the statement given in the theorem regarding 8. If the 
partial moments y,2,, and w,:,, are constants in y, the words “or higher” in that 
statement may be deleted; otherwise they remain. It is obvious that the forms of 
8 and y’ must be similar. 

(5) Elderton-Isserlis Example. In the second* of two papers entitled, “On 
the Partial Correlation Ratio,” Dr L. Isserlis has discussed some data prepared by 
Miss E. M. Elderton, in which the height (x), weight (y), and age (z) of a group of 
individuals are correlated. The units and origin chosen for these variables in that 
paper are the same as those chosen by me in this one, and so it is easy to write in 
our notation the conditions of Isserlis. He finds that 

I (X,Y) =Jouo FIJnY +L (Gr + Guy)  ..cererecccercccevees (A) 
is a good fit to the regression of z on «wy, and he states that Soper has found a 
good fit also to the regression of y on «z by using 
he (a, 2) = Iiog + hag 2 + Iga? + 0 (ayo A hy Z 4 hrye2*) «0.2.2... 0eeees (B). 

No attempt is made to find the {0.10 of &(y,z). It will be our object to discover 
whether, consistently with (A) and (B), and with certain obvious restrictions on 
the total regressions to be mentioned directly, it is possible that /&(y,z) should be 
a polynomial of as low a degree as the second in y and z separately. So let 

k (y, 2) = ke + ko 2 + hone? + y (hy + hie + hyn2*) + y? (Kg + hn 2 + hee”)... 0). 

A glance at the regression curves pictured on pages 53—56 of Isserlis’s paper 

is sufficient to convince one that the following may also be safely assumed: (a) the 





total regression of z ou x, a(x), is not a conic section or a straight line; (b) the same 
is true of the total regression of z on y, 8(y); (c) the total regression of « on y, 
y’ (y), is not a constant; and (d) the same is true of the total regression of y on a, 
y(z). With regard to the remaining two regressions, they appear to be nearly 
linear, and although this may be an illusion, we shall so assume: (e) the total 
regression of y on z, 8’ (z), is a line; and (/) the total regression of « on z, a’ (z), is 
a line. Our proposed problem might, therefore, be stated as follows. Is it possible 
that k(y,z) shall be a polynomial as simple as (C) and at the same time that 
a’ and f’ shall be linear, without contradicting at least one of the necessary 
hypotheses, (A), (B), (a), (b), (c), and (d)? It would seem to be worth while 
undertaking such a discussion before proceeding further with the numerical 
problem of fitting the regression surfaces of this solid. 
Starting from (A), (B), and (C), we may write our total regressions: 


a (x)= Joo + JF + (Jor + 9ux)y ) 


+ ukgthaththwetletedels fo tereseteeseseeeenenesens (1), 

a’ (2) =k t+ ky t + ko.2? + (kyo + biz + ky2*) BY + (Keon + bea 2 + hog 2*) pe ye. ) 

B’ (2) = hy + hg 2 + hyn 2? + (hy + in + ye") j (2), 

B (Y) = Jot 9ny + (Go + Juy) ¥' , 

Y (y) = (ko + hoy + koyy)+(ka +hiy + kay?) B+ (ke + kyy + key?) b’22.y ) (8), 
where et. = Met. HO, fl gt.c= Myc tH BY wat. y= Met. FBP «20000005 (4). 


* Biometrika, Vol. x1. pp. 50—66. 
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First solve (2): 
Ot? (Kay + Keay 2 + Mey 2*) (Ry + ly 2 + yz? P 
+a! {2 (King + hea 2 + Ioon2*) (hog + hor Z + lign2”) (Iti + in z + 22”) 
+ (ky + key 2 + hye 2?) (Ig + hin 2 + hye 2”) — 1} + (lea + hen + fen 2”) {wy 2 
+ (hog + hor Z + hion2®)?} + Cho + kine + fey22) (loo + ln + egg 2”) + Keog + hin 2 + kone” = 0 


eunes (5), 
B? (Ryo + iy 2 + hyg2*) (heay + hin 2 + bez?) + 2" (Io + lin 2 + lne2”) (heyy + fe 2 + hen") — 1} 
+ hcg + ao, Z + Migg 2? + (Diyo + ly Z + Iryz.2*) (ooo + feo 2 + oe 2”) = 0 ose ceeeeceee eee (6). 


By hypothesis, (5) and (6) must be straight lines. It is therefore necessary 
that either: 
(i) ky +kyz+k.2? =0 for all values of z, and not (ii); or 
(ii) hy +hyz+hy.2*=0 for all values of z, and not (i); or 
(iii) both (i) and (ii). 
Before solving the other equations (1) and (3), it is convenient to pursue each 
of the above three possibilities. 
Suppose (i). Then ky = ky = k..=0, and equation (5) becomes: 
a { (Key + hye + hye 2") (yy +z + hye2*) -— 1} 
+ (Key + hey 2 + beye2) Choo + hn 2 + higgz”) + Boo + kin z + Kiyz® = 0. 
For this to be a line it is necessary that: 
Kyhy=0, kyhotkehy=0, Kohyt+ huhu t+ kyhy =9, kywhy + kyhy =O ...(7), 
and that 
Kersh =0, Kyhot Kighy = 0, Kos + Kigloe + bilo: + Kighoo =O «2.200005 (8). 
Equation (6) now becomes: 
B’ (hag + hind + Iya2”) (Koay + don 2 + heen 2”) — 1} 
FH iggy # hy Z + hone? + (Rao + lind + yz”) (Kn + hin z + hz”) = 0, 
and for this to be a straight line it is necessary that, in addition to (7) and (8), 
Kelis =0, ko hy + ky his =0, Ios + Kop hie + Kor hin + Koahyy =O ....000. (9). 
Consider the inferences to be drawn from (7), (8), and (9). It is necessary by 
(7) that, either 
(ia) k,»=0, and h,#0; or 
(ib) k,»=0, and h.=0; or 
(ic) ky, #9, and h,=0. 
First swppose (ia). Then (7), (8), and (9) become : 
Foxy  Keyg Keay oe Moen = Ihe + Koeghrg BO... .cccccccevceccees (10). 


Condition (i) and equation (10) placed in equation (3) lead to the conclusion 
that y’ (y) =k, contradicting hypothesis (c). 
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Neat suppose (ib). We have ky = ky =ky = ky =hy =0, and equations (7), (8), 
and (9) become : 
Ky hu =kylyt ky hio= ky he=hoet ki host ky ln = heel = het hy hin + koh =0...(11), 
and it becomes necessary to consider three further cases : 
Case (1b,): ky, =0, hy #0; Case (ib.): ky =0, hy =0; Case (ib,): ky #0, hy =0. 
Suppose (ib,). Then ky =0, kwe=0, hot kyhy =0, 
and, by (3), B=Joot+ Gay + (Gro + Guy) hoo + kn B), 
which contradicts hypothesis (6). 
Suppose (ib,). Then ke + koh =O, hos + Koh = 0. 
This supposition does not lead to a direct contradiction of the ‘rypothesis, and will 


be considered further, after disposing of the other possibilities, all of which we 
shall find do lead to contradictions. 


Suppose (ib;). Then Nyy = hee = hea + kn hy = 9, 
and, by (1 ), a= Joo + Jw 7 (Ju = Ju”) (Ii + hin a), 
contradicting hypothesis (a). 


Now suppose (ic). We have 


ky #0, Ing = key = key = ee, = 0, 
and so, by (7), (8), and (9), the following : 
Bas = Rang = Tagg hey Tong + Ragheg © 0 ...ccecccsevrcveens (12). 


By (1), then, y (w) = hw, contradicting hypothesis (d). 
Suppose (ii). Since hy = hy =h,, =0, 
equations (5) and (6) become : 
— al + (Keo + hey Z + heen") {pyr 2 + eo + hire + he z*)?} 
+ (Ryo + hey 2 + bye 2?) (tog + hon Z + hon2”) + keoo + kine + ke 2* = 0, 
and BY = hy — hy z — he2? =9. 
For these to be lines it is necessary that h=0. Then, by (1), 
A= Joo + JX + (Ga + JuX) (hoo + hore), 
contradicting hypothesis (a). 
Suppose (iii). The argument against (i!) holds here, and eliminates this case 
also, Let us return then to the only case which has not been proved impossible : 
Case (ib,). We had 
ky =k, = ky = ky = hy = hy =h,=0; 
and koe + Kighto2 =9, No + Kral = 0. 
From the last two of these equations we may derive : 


ho (A — kh») =0, and ky (1 — koh») = 0; 
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whence, either hyohy=1, or else hy = k.= 0, but in the latter case, by (3), 
contradicting hypothesis (b). So we should add to the above 


Our conditions involve the regression surfaces : 


Now it is probable that the regression h(#, z) cannot be fitted well by the 
expression in (14), for if it could be, Soper would probably not have used so many 
terms; but, in addition to this practical difficulty, the relation (13) imposes a 
restriction so severe as to make the form of /:(y, 2) given above highly improbable. 
That is, if h, has been determined before the k’s, then there are left but three 
arbitrary constants in /(y, z). So it would probably not pay to pursue this case 
(ib.) further by numerical work. It would seem necessary, therefore, to generalize 
the conditions laid down at the outset of this section if we are to fit regression 
surfaces to this solid. 

(6) The following conclusions may be drawn from the developments of the 
preceding pages. 

(a) It is known, as stated in Section (2), that, in a two-way correlation table, * 
severe restrictions cannot be placed on the regression curves without restricting 
the type of frequency distribution to which the marginal totals may belong, and 
that, if one knew in advance, as often happens, that the frequency distributions of 
the marginal totals were not of such restricted type, it would be inconsistent to 
restrict too severely the regression curves. It has now been shown that there are 
also, not exactly similar but similarly important, inter-relations in a three-way 
table. If the total regression curves are determined first, as they usually would be, 
and if they are not of the types described in Section (4), it would be inconsistent 
to suppose the regression surfaces such simple polynomials as are there described *. 





B= Jot Jay t+ (Dro = InY) {(Kuo + kyoy) 7. kB}, 


Ripe eee hoa e oa once ache ore Somatae (13). 


g (a, y) =JutIJnyrte (Dio + Iny) 
h (a, 2) =I +h + hz? + thy 
k: (Yy, Z)=ehythaz + knw + yhry 





(b) It has been found that the simplest assumptions frequently lead to total 
regressions at least as complicated in form as 


* This does not mean, of course, that it would not then be permissible in practice to try to fit the 
data with those simple polynomials. In so doing, however, one should recognize that an approximation 
is being made: the simple polynomial forms used should be thought of as approximations to the more 
complex forms strictly required. It is useful to know what are the mathematically self consistent types 
rigorously demanded by a given problem, but this knowledge does not prejudice the conscious use of 
any approximate substitutes for them which may be practically convenient. Likewise, the remark in 
Section (1), that hyperbolic forms (equation 2) should be thought of as occurring frequently in two-way 
tables, does not exclude the use of polynomials in two-way tables, at least over a moderate range of the 
independent variate ; but the hyperbolic forms have a decided theoretical advantage, and so, except 
that they may increase the burden of computation, they should be regarded as preferable. 


polynomial in a 
Y= . . ° . 
polynomial in « 
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It seems probable that regressions of this type should be regarded as of common 
occurrence, and that when in a two-way table regression is not linear a regression 
of this type should be one of the first tried. For every two-way correlation table 
may be thought of as really a table of totals from a three-way table, as pointed out 
in Section (1). Another argument for two-way regressions of this form is that one 
of Narumi’s simplest surfaces* gives regressions of the hyperbolic form : 


r a 
~ cx +d’ 

(c) The reader will have been aware that the argument in (b) leads further than 
was indicated. If we are to infer the common forms of two-way regression curves 
by regarding them as the total regression curves of a three-way frequency solid, 
we may also regard them as the total regression curves of a four-way frequency 
solid, and proceed by making simple hypotheses as to its three-way regression 
solids ; and so on, to any number of dimensions. It is undesirable to lengthen this 
paper by a detailed study of four-way solids, but the author has in fact carried 
such a study far enough to convince him that it is seldom if ever possible to posit 
simple polynomial regression solids in (m— 1)-way space (m > 3), and finally obtain 
simple total regression curves—save, of course, in the classic all linear case. It 
seems altogether likely that we may ultimately be obliged to abandon polynomial 
forms of regression and use transcendental functions instead. 


Y 


(d) If it indeed be true that simple regression surfaces for solids of three (or 
more) dimensions often imply very complicated total regression curves, it is a fact 
to be reckoned with in many applications of statistics. Consider, for example, the 
problem of business forecasting. One wishes to find the most likely value of a 
variable z for given values of variables w, y, etc. To fix the ideas, suppose z to be 
the money value of a certain industrial security six months hence, # the index of 
present demand for the article produced, and y an index of present*+ inflation. 

Now sup one has a considerable amount of data, and in the attempt to 
solve this prc. iem first seeks the functional relationship between z and « only. 
Suppose it to be obvious from a glance at the graph thus obtained that it is very 
complicated. The practical man’s judgment is that it is useless to try to fit it with 
a “mathematical curve,” that a straight line is as good as anything, bad as that 
may be, and there is theoretical justification for that judgment. But he is apt to 
follow it up with the opinion that mathematics has no valuable place in this sort 
of problem anyhow, which is unfortunate. So far he has been dealing only with 
a total regression curve, z on x, but as a matter of fact z is really a function of two 
variables, « and y. It may well happen that the functional relationship between 
z and the two variables « and y is representable by a simple mathematical surface, 

* Seimatsu Narumi: ‘‘On the General Forms of Bivariate Frequency Distributions, etc.,” Bio- 
metrika, Vol, xv. pp. 216, 217. 

+ In order that the author may make his meaning clear it is necessary to suppose a simple case, 
simpler than actually exists, as the reader will be well aware. Let it be understood also that these 


indexes of present states might if desired be cumulative indexes, obtained by observation not only of 
the present moment but also of the recent past. 


Biometrika xv11 30 
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and that this could be demonstrated if a solid correlation table were to be formed. 
I do not think that this method has been extensively tried. It involves a large 
amount of labour and of data, but it would seem to be a hopeful method of approach 
to the problem. In order to distinguish the two most important independent 
variables from among several it would be necessary to use multiple , but this is 
not very much more difficult to compute than multiple vr. I do not mean that 
methods of multiple correlation have not been tried, in the sense that the multiple 
correlation coefficient z on wy has been found, but this method we have already 
ruled out by supposing the total regressions non-linear. As we have seen, the 
only case where simple polynomial multiple regression leads to an equally simple 
total regression is the all linear case. It is peculiar in this respect. Therefore, 
lack of success in proceeding from simple linear to multiple linear regression 
should not discourage attempts to proceed from simple to multiple regression 
when the regressions are not linear. 





























MISCELLANEA. 


On First Power Methods of finding Correlation. 
By KARL PEARSON. 


(1) The “ Linear” Ratio. The measurement of association by the correlation ratio n suffers 
from the disadvantage that y, when we sample from material where there is no association, still 
shows a positive value; 7? is the sum of a number of positive squares and thus its average value 
in a number of samples is not zero. If ¥, be the mean of the array n, and 7 the mean of the 
whole population, o, its standard deviation, and JW its size, it would be a distinct advantage to 

. Ne Yx—-Y - Nx (Yn —-Y\? 
have to deal with § (“* “=—7) instead of S(“2(“—7) ), 
N ay, N\ oy 

Now if we considered normal correlation and summed throughout the range, the above sum 

would be zero, but if we sum only for 7, > ¥ or for positive differences we have : 


S* nz (Ya-Y) _ : LU — [ro eer St adem aoe (i) 
No, V2noz0, J 0 ox N20 
. S- Yx—Y os 
Similarly : = =- me 1 ucnuweesesscoueqbecntsatciennsasuvaasan (ii). 
aN Gy 2a 


' EEN pie 7 Ny Yu-¥ Ny Y-Yx aon 
Or, we have: i ms (8 (5 x \ oat (F = a Pcveneshieieey eee (iii). 


This is very simple theoretically, but it is not very effective in practice as we do not know in 
the case of grouped material* where the exact limit comes between «> and «<4, i.e. this point 
will probably be in the middle of an array, Let us call this array in which the mean, or for 
present purposes the median, occurs, the median array. Now in a portion of the median array 
x>Z and in a second portion «<#. What we have to determine are the contributions to the two 
sums of the parts of the median array on either side the median. Let the frequency of these 
parts be n, and n,, the means #, %, #2, Yo. On the marginal frequency for x let Nz/o,, 
Nz/ox and N2/o, be the three ordinates of the normal curve, at the start, the median, and the 
finish of the median array. Then we require 


nm WY Ne Yo—¥ 
mATY and ¥ 2 ad 





N y oy 
or using the regression line 
Mm Vy 1 No Xo—F 
and r= — : 
N Gz N oa; 
(z—%) (— 29) 
But &,-é=0,-1—™ and %-—aé=0, — 
: ‘ae . * n/N 
Mm w-Y Nz Yo- ¥ 
Hence : = 2 J. —r(%—%) and 2 aI, 2 — 22) 
N ay N a, 


It follows therefore that n 
r af 2 _ Srs(Ye-) above median array 


7 No, 
Sn, (y—4y : 

e—2 (Y= Yz) below median array 
No, : 
+7 (22 —2,— 22) for median array. 

* For non-grouped material in which each individual is provided with an accurate measure of 

x and y this difficulty does not occur, but in such cases r can be found exactly by the product-moment 

method, and it is just in short series of data that it is desirable to use the most accurate method. 


30—2 
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But 2=- ; -. Thus it follows that 


Tv 


1 Sry Yer) . . - 
ra : awe y , for all arrays, excluding median and taken always with the 
24, +22 No, 


same sign, right and left of median* .................6+++(i¥)- 





The advantage of this formula is that except for the median array—which should be as small 
as feasible—it is independent of the number of arrays; for clearly any group of arrays can be 
clubbed together without modifying the value of 7. There is accordingly no class index correction 
necessary. Further if 7 be really zero, there will be positive and negative values of 7,—y on the 
right of median, and negative and positive values of ¥—¥y, on the left of median. Accordingly 
v in samples will in this case fluctuate about zero, being sometimes positive and sometimes 
negative; we have thus none of the difficulty which is associated with the correlation ratio, 
which can often take a quite sensible value—depending on the number of arrays—where the 
true association is zero. 

The disadvantages of the formula are that strictly it applies only to normal distributions of 
frequency, and therefore to linear regression and Gaussian variability. Practically, however, 
these limitations are less important than might be anticipated, and we will illustrate the method 
on cases which cannot be classed as normal in frequency. 


Illustration I. The following values represent the mean place in class of 594 boys grouped 
in six intelligence classes : 














| | Very Able Capable Intelligent | Slow | Dull Very Dull Totals 
| | | 
% sl Seat al Fan : 
| Number | 15 70° | 216 | 236 17 10 594 
| Mean | 4033 12°043 | 18574 | 28°890 | 38°606 16°300 23°5875 
| 
The standard deviation of place in class = 132501. 


Actually if all boys in the classes had been classified the mean place in class (reduced to 
classes of 50) should have been 25°0 and the standard deviation 14°4338. But the distribution 
instead of being rectangular tailed off considerably after class place 37, showing that the duller 
boys were not presented in due proportion for the intelligence estimation. We have still a very 
non-Gaussian distribution. Further the median category 216 is unfortunate, it is very large and 
z is close to 2. We found z,;='225,889 and z.=°398,878. Notwithstanding these disadvantages 
we found 7=°6683, while np, corrected for Class Index for intelligence was ‘6780, This is very 
remarkable accordance considering the nature of the material. 

To test the method further I took some cases where it was possible to apply directly the 
product-m sment method, but it seems unnecessary to reproduce the tables in full. I will merely 
cite references. 

Illustration [1. 1112 Cases for Statures of Father and Son. See Biometrika, Vol. xvi. p. 219. 
From means of Son’s stature for given stature of Father, there results: 7='5494, where 


‘5248. From means of Father’s stature for given stature of Son we have: 7=‘5552, where 
n= ‘5268. The difference of the two, ‘0058, is practically of small importance. The actual value 
of the correlation found by the product-moment method is: 5206 +°0147. This shows the 
degree of divergence between the two processes for material, which forms as normal a bivariate 
distribution as we are likely to obtain in practice. 


Iilustration IIT, 2922 Cases of the contemporaneous Heights of the Barometer at South- 
ampton and Laudale. We have here a case in which the distribution is very sensibly non-normal. 


* Set as a problem in the London Honours Examination, B.Sc. Statistics, June, 1925. 
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The distribution will be found in the table on p. 291 of the present issue. Taking first the means 
of arrays of Laudale for constant Southampton we have: 7=°7927, where 7=°7838, and secondly 
from the means of arrays of Southampton for constant Laudale we find : =°7733, where »=°7808. 
The mean of the two ‘linear’ ratios is *7830, The actual value of the correlation as found by the 


product-moment method is ‘7802 + 0050. 


It will be clear therefore that we do not get results 
markedly skew than when the material is practically 


substantially worse when the material is 
normal, notwithstanding the method— 


based on the normal curve—that we have applied to distribute the material in the median section. 


The method will, I think, be of value in cases where one variate is qualitative and the other 
quantitative. In this case the custom is to compute the correlation ratio; the present method, 
when the correlation ratio 7 is of order 7 (the mean value of » for no association), will throw 
some light on whether such y’s are of significance or not; for our present coefficient should be 


zero if the value of » be merely the result of random 


sampling. It can serve in a limited way 


as a control on yn. Its value will undoubtedly be increased, when the probable error of 7 found 
in this way has been determined, This I hope to provide shortly. 


Instead of dividing our material into those individuals above and those below the mean we 
can distribute it into four classes instead of two, by reference to both means. This leads to an 


interesting series of formulae for the correlation, which 


(2) “Der deutsche Korrelations-Index”—and some ot. 
volumes into which the normal surface 


I now proceed to deal with. 


hers. Let us find the centroid of the four 


N oe (5-74+% 


2= ee te To 
270102 4/1 02 





3 2 
Cy %%, Gy 


is divided by dichotomic planes through the w, z and y, z pairs of axes. We may call these 
quadrants, as marked in the diagram, (1), (2), (3) and (4), the corresponding frequencies 7, 72, 
nz and 74, and the coordinates of the plans of the centroids 7, 913 #2, Yo; 3, Ys; and #%, y, 


respectively, Clearly for a normal surface, we have m=”, 42=—4,, Y=—-Y, and ny=n,, 
T= —%3, Jy=—Ys3. We may, however, with a view to later applications maintain the fuller 
nomenclatures. 
ao 
(2) ng (3) ng 
—-x# = - 7 —+27 
0 | 
(4) nq (1) 2 
+y 


Clearly : 


ie T D a pe 
2 | N 1 | | 
Nyt = azdady = a x 
JoJo 2rojo2/1—--2J 0] 0 


=| |  yededy=,"__1_ [* [* 
Ny Y= yzdaxdy = — ae ale 
sil +; of J saa eet ty) 


Hence : 


Pe (5 _ ray y” 
PoP a0,  o/ dxdy, 


) 


\ 1 (= _ ray +# 


O° Gy =) dxdy. 


= 1 x y \* y? 
= (2: aE i) =, = a Le 21-7 \o, 0) ¢ “oF drdy, 
v7 2 2ro0\o2 V1 —rPJoJo\n Sa A 
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x 


Let us now put X= — ( ~ rt) and integrate first with regard to Y: 
7 2 


a7 


r y = 
; ~ N/T 72 a Boss ela ee 
ny (= -r 2) = {,* v-| - e al V1-P ee, . ox" dy 
a1 C2, 0 2roez 0 


2 


a 1 2 
a aay NV A-") 
= say= 
2/29 


ry) i "4 _ 
Similarly : ny (2 - rat) = Se ) 
2 1 bis 


Or, solving, we find, what follows also from the symmetry (i.e. #/0;=%/02) : 


a NA+) _, hh 

















n Ny ce eccnccrcccccccsecerceecescovoreceees (v). 
, a} 2/Qr . o2 
Again: 
if 8) 
o d tire oy 90, 92) ded { 
Ngt3= [i . L2bL = tenaivica los * 1 1 y 
vy iy 
y 7 ~3(-r) —— +2 
£ if, Jom Re a(% ( "ea, a Dim 
~ Qa 02 rr —2 
by changing y to —y. Thus we reach the same result as before by changing + to —7; or 
i re 7 : 
3 = ¥Q im ng 23 lc Ui liiacidaiaeuneensmsnssencmeeosaunin (vi). 
a) 2/29 
Now let x,, y, be observations falling in the sth quadrant. Then 
N is , N(i+r 
S(a)=m4%,= 7 —_ > S(#2)=Ng%2= ak as 
2/2Qr 2/20 
S (x3) = n3#3= = i 2. S (ay) =2y74= aS — 
2/2 2/20 oe 
v(+r) rL4r) BF imckeceucs (vi bis). 
S(y)=y4"n= nn S (y2)=N2 j= — ws Ea be 
(*Hw=un" 2/2e (Y2) =N27/2 22m 
ba oN (1-7) , ao oa, N (1-7) 
S s) =e Yas — = S = = — es 
(Ys) ="sYs sae” Y)d=un a ! 





These equations by elimination of o; and cy, give us a variety of methods for finding 7 on the 
assumption of normality. 


S (a4) - S (#3) a S (#2) — S (a4) 


For example : = S(a,) +5 (as) inva (viia & b), | 
= aes a= Resa Cie tate Pate (viii a & b), 
whence : =" es (2) = = eit wy Rcessiateasicetantaniecunmeretan (ix a), 
HFS BES Ey cvirvteneneennnen (ix). 


Or, again we may take: 


_S(%)— 8 (#2) + +8 (2 v4) +8 (ys) —S (3) + 8 (yy) — § (ya) — 8 (ys) (x) 
S (a) —S (a) + +S (x. 3) —S (a) +8 (y1)— 8 (y2) — S (ys) +5 (y4) ecvesessvcesess X), 
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or still again : 


1 fs (a) — S (x3) _, S (a) -8 (x4) 


r= 2 \S (ay) +5 (2) 5 (a) +S (a4) j PTITTTITITITI TI TTTITTTTTTTTTTTTiTiriTr rit (xi), 
—1 (S)—-S (ys) , S(y2)—S (ys) _ 
4 {3 ated “Baad Gee eres (xii), 
ai {5 (x1) — S (a2) — 8 (a3) +8 (ay) , S (yi) —S (ye) +S (ys) —S a (xiii) 
* (S (a1) — S (#2) +S (x3) —S (xy) © S (ys) -S (2) —S (ys) +S (ya) 0" 


The right-hand side of the last result has been termed by Lenz “Der deutsche Korrelations- 
Index*.” He appears to think that it has some special merits which he does not, however, reveal, 
beyond stating that it is easier to compute than the usual correlation coefficient. As, however, 
he appears to compute the usual correlation coefficient by subtracting each deviation from its 
mean, there is no wonder that he has found the process laborioust. Lenz’s “deutsche Korrela- 
tions-Index” is the usual correlation coefficient when the distribution is normal, or approximately 
normal, a result he does not seem to have been aware of. It is really only valid for such cases. 
What he then has reached is a German method of finding the usual correlation coefficient, and 
whether it is better or worse than any other method will depend on its probable error. This he 
has not determined. It is true he does give a probable error for each value cited of the “deutsche 
Korrelations- Index” but it is calculated from the formula for the probable error of the product- 
moment coefficient, and therefore is entirely fallacious. The probable error as in Sheppard’s 
formula, which is far simpler of application than Lenz’s, will probably be found to average 30 7% 
to 50 % more than that of the product-moment method, which has the minimum probable error 
of all methods discoverable. Lenz asks why sums of squares and products should be used when 
it is so much easier to calculate sums of deviations{. The answer is an easy one, it is done 
because the method is more accurate. 


The advantage of the correlation coefficient lies—whatever be the distribution—in its 
geometrical properties, i.e. its relation to the slope of the best-fitting straight line and the 
expression by aid of this coefficient of the mean square deviation from this line. The “deutsche 
Korrelations-Index” is simply this coefficient if the distribution be practically normal. If the 
distribution be not normal, nobody can say what the coefficient means; it does not as far as 
I can see express any fundamental geometrical properties of the frequency distribution. 


* Lenz gives no proof whatever of his formula, he simply writes it down (Archiv fiir Hygiene, 
Bd. xc. 8. 147) after a criticism of the product-moment correlation coefficient, which is wholly 
fallacious. , 

+ The first correlation coefficients calculated by the product-moment method were those given 
in my third memoir on evolution of 1896; the product-moments as well as the standard deviations 
were on that occasion calculated, as every trained mathematician would calculate them, about an 
arbitrary origin and then transferred to the mean. It is therefore somewhat amusing to read in 
a recent American text-book that this great discovery of reduction to the mean was made by an 
American in 1917! Every mathematician and engineer has used such reduction since “radii of 
gyration” and ‘‘principal axes” were discovered in the eighteenth century. 

t+ Ido not propose to criticise Lenz’s sociological hypotheses, bui I would criticise his statistical 
methods. He has eleven correlation tables. Of these tables one has 54 and one 20 entries, the 
remainder have 8, 10, 9, 8, 10, 10, 8, 10 and 14 entries respectively. To all these he applies the 
probable error of the correlation coefficient on the assumption that he is dealing with a large sample. 
It may be safely said that it is totally inapplicable to all the numbers dealt with, excepting possibly 
the first two, but here the intensity of the correlation is such, -*86 for the 54 (Bayern) and — 72 for 
the 20 (Deutschland), that the probable error as found for large samples is without meaning. The only 
case with sufficient pairs to give a reasonable result for the correlation is that of Bayern, and here the 
German correlation coefficient is close to the product-moment value. 
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Returning to our Equations (vi bis) we note that 


a Nr _ \ 


S (2%) -8S (#3)= Vin S (#4) -§ (#9) 


“ 





»Nr 
S(q)—S (ys)=Z—= =. S$ (ys) — 8 (ys) 
Vie EE Rea (xiv). 
N 
S (21) +8 (3) =F = — {8 (2) +8 (wy) 
29 
‘9 N " . p 
S(m1) +8 (ys)= 7 = — {8 (y2) +S (ys)} 
A/ 23 
ae ee S(a1)—S (v5) _ 0, _ S (#2) —S (#4) rans 
Accordingly : S48 (y) =1 aes ey teers (xv), 
S (x1) - Sys) _ 02 _ S(y2)— S (ys) (xvi) 
S(e:)4 Se) 7" o, = S(r.) +8 (0) iidaiaisiedaeed xvi). 


We have thus two expressions from each of which, or from the mean of each pair, the two 
regression coefficients can be found. These expressions give us a series of values again for the 
correlation coefficient such as : 


o_ S(41)—S (a) S(y1)—S (ys) _ S (#2) —S (ws) S (ye) —S (ys) 


S (ay) +8 (x5) S(y)+S (ys) S (#2) +8 (a)” S(y2) +8 (ys) 
or, * may be taken as the fourth-root of all four products. But again : 


r ..(xviia & b), 























a1 _S(a a) S (22) S (x3) _ S (#4) wit 
wo, S(y) 8 (yo) ~ (ys) a 301) ecaabucadpwewsceebeveuce (xviii). 
- S (#3) 1 _S (#4) 
Bs aa S (x) _ S(#) as 
Hence : r= ‘a S (ys) i 1 4 5s) Biss NG, Gap ee yeenoncdocasucackeectas (xix a), | 
S (7) S (y2) 
ve, S (ys) me S (ys) 
Sin) _ | Sy) 
= - = : WP UG cicancccuscoucuacesveanqnecdeu cix 5). 
5) 3,50)" a 


S (1) S (x2) 


Further we have : 


o, Ji-r'=2 /2n Ae Ou Te) 2g 5/8 = ,/ 28 a) —— siete (xx), 


ae r/o S|) ee S(y2) S (ys) 
»J/1-rP=2 Jn ——. =2,/2 teense 
ov “4 AN Jie | ¥ 
It is thus possible to find the mean square salsa of the arrays about the regression line. 
Since 7 is known, we can also find o; and a». 


.(xx b). 


The fact that from the eight sums we have eight equations to find three quantities, 01, o» 
and 7, shows that we have ample evidence to measure the degree of approach to a normal dis- 
tribution. For we ought to have 

S (x)= -S (a2), S(a3)=—-S(xy), S(M)=-S(y2), S(ys)=—- Sy), 
but these equations will be rarely accurately fulfilled, especially in the case of small samples. 
It is therefore advisable to average out the different values obtained for the constants. 

A difficulty, however, arises as in our previous discussion. It is only possible to deal with 

individual measurements in the case of relatively small samples*. Most observations ar 


. 


* Since these methods involve knowing the means and the actual measurements of each individual, 
we can classify at once into four groups and use Sheppard’s result, which does not involve adding 
measurements but only counting the number of individuals. But here again product-moment methods 
would be of higher accuracy, and ultimately (with grouping) shorter for long series. 
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grouped, and there arises the difficulty of distributing the individuals in the cell (often one of 
large content) in which the mean lies, and in the arrays through which the mean dichotomic 
planes pass. If we assume a normal surface and endeavour to integrate out the volumes in 
question, we obtain long series in 7 and the tetrachoric functions, so that the final value of 
r can only be ascertained by approximation. It is worth while determining what sort of 
accuracy we get by computing the correlation by these formulae on a long series, which diverges 
not too much from normality and where the product-moment coefficient is known. I will take 
the table for correlation of stature in Father and Son provided by Professor Yasukawa on p. 218 
of the current issue*. 1 have surrounded with heavy lines the column and row in which the 
mean statures of Father and Son lie: see Table, p. 466. It is clear that to obtain our sum- 
mations, ie. S{v) and S(y) for the quadrants, we must proportion the central column, the 
central row, and their intersecting cell. As I have said, the true proportioning depends on 
a knowledge of 7 itself and, even supposing we take an approximate value of 7, is extra- 
ordinarily laborious. I shall therefore content myself with assuming that the contents of the 
cells are rectanguloid blocks and taking as proportions the volumes standing upon the corre- 
sponding areas. Even so, the total labour of proportioning and determining the means of the 
quadrants is three or four times as great as that of finding the product-moment correlation, and 
does not provide the other requisite constants of the distribution. 


Now the central line of the emphasised column is at 5’ 9” stature of Son, or the mean 
(5 8”, °65) is at — 35 of our unit, an inch, from this central line. Similarly the mean stature of 
Father is - ‘30 from the central line of the emphasised row. Accordingly the coordinates of the 
mean referred to our working origin O are - ‘35 and —‘30. We have first to proportion the con- 
tents, 25°75, of the central cell into four groups, a, b, c, d. The areas of the bases of a, b, c, d are 
as 80 x °85, "15 x ‘20, *85 x ‘20, and °80x°15, or as °6800, ‘0300, *1700 and -1200 respectively. The 





| ! ' 
| ! =) | , 
ce | I x | V2, V2 
| | 2 ie 
\ - - : X23, ; X32 
CU =) Be eee Be. | Pe 
e | 
| | | Yu, XX | b € | 
! 
J) - 


O 


frequencies will be 17°5100, °7725, 4°3775, 3°0900. The coordinates referred to C will be +°425, 
+400 for a, —-075, —°10 for 6, +°425, —°10 for ¢ and —-075, +:400 for d. These provide the 
following contributions to the first moments of the four quadrants about the axes through C: 


r= +7°441,7500, ay = + 7°004,00, 


b,=— 057,9375, by=— 077,25, 
“e= + 1°860,4375, ey = — 437,75, 
d,=— -231,7500, d= + 1'236,00. 


We have next to obtain with the same form of proportioning the contributions to the « and 
moments of the emphasised column, now omitting the central cell. 


* Our results are corrected for grouping. For Professor Yasukawa’s purposes those corrections were 
not used. 


r , 
F3 ’ X31 


r , 
} 13> Xx 18 























,emtu 


S10L-,8=" 6969-,,L, 


LELO. + 90z¢. =" 68EL+,2 [1¢9.,8,¢="u 








STVIOL, 
{ 


. . : 
ZILL | OG- |00-1|00-Z!0¢-L | 0¢-F |00-01| 00-8Z| O¢-GF | 0G-69 |OG-80L 0¢-0€1 | 00-EE1 00-21 00-2¢1 | 00-86 | 00-09 00- TF |0¢-02 00-¢ 00-€ 00-é 





0¢-¢ 2 | = 

09-82 CZ.Z JulI—Y.0 
00-£¢ Tl), IT Lg 
00-08 
00-611 
00-8F1 | 


cé-LI 
CZ-O0Z@ 
c. OL 





0¢- 691 





00-8FI : | CZ CZ-LI ICL 
09-€F1 BI -L | GL-8L] GL-61 

09-6 ; : 19 CL 11 
00-F9 
00-98 
00-61 
00-2 
0¢-€ 
00-8 


IS SAoyyryy 


191s 
I~ it~ 
» 


AQ A MO tN 


i~ON 


fi = oangr 


SS 
» 
| 
Ss 
3 
a) 
= 


a> 
nN 











S90], 

















‘v= on 11g §,0g 


‘DANII, S UOT PUD aLNIDI S.LayjoY Uaamjag U01yN)]ALL0/) 


ATAVAL 





Miscellanea 467 


The notation for these contributions is given in the third figure, the quadrants being 
numbered 2, 3, 1, 4. Thus Yo; signifies the moment round the centroid axis of x, which is 
contributed to quadrant 2 by the column between quadrants 2 and 3, X3; is the moment round 
the centroid axis of y which is contributed to quadrant 3 by the row which lies between 
quadrants 3 and 1. And soon. Clearly X93, X32, Xj, and Xy are very simply found. We have 
only to proportion the total of the parts of the central column in the ratio °15 to ‘85, and 
multiply by the arms - ‘075 and +°425 as the case may be. Again Yo, Y3,, Yyg and Yj; admit 
of being found in like manner. In the case of Vo3, Y32, Yn, Vis, Xo1, X31, Xig and Xy, it is 
best to compute the moments of the portions of the rows and columns first about the working 
axes, and then reduce these to the centroid, and finally proportion them in the ratio of the given 
parts they are of the total column or row. Thus consider Y2;. We take the frequencies of upper 
portion of central column : 








1 x-7=- 7 

- x-6= — — 146:0=value about 0, 

5 x—-5=—-— 2 

55 x—-4=-— 22 — 146°0+ °30 x 60°25 = — 127°925, 
11°75 x —3= — 35°25 

19°75 x —2= — 39°50 =value about C. 
17°25 x —lL=— 17°25 
60°25 —146°0 


Proportioning in the ratios of *15 and °85, we find 
Yo3= —19°18875, Py2= —108-73625. 
In this manner all the column and row contributions were found, straightforwardly, but 
laboriously. We have: 





Yy3= — 1918875,  Xo3= —°677,8125, 
| Yo9= = 108°73625, X3= + 21°765,3125, 


| Yn, = — 1°26500, X= +37°177,500, 
Yi3= +20°24000, Xq3= +148°710,000, 
Yyy= + 241425, Xy= +24°926,250, 
Y= +136°8075, X= —°776,250, 
Y= — 141000, X y= — 105°660,000, 


Yo4= + 22°56000, Xo = — 26°415,000. 

The next stage was to determine the moments of the quadrants apart from the proportioned 
central column, central row and central cell. These were found about the working origin, and 
then reduced to the mean. We will call them S;(x), S;(y), where ¢ refers to the quadrants. 
I found: f 

S, (c)= +765°3625, 8, (y)=+741°775, 
S, (#) = — 908°7875, S2 (y) = — 936175, 
Sq (a) = +203°7750,  S,(y)= —155°550, 
S, ()= - 167-7125, — S, (y)=+265°175. 

We are now prepared to combine our results so as to obtain the first moments of each 

quadrant about both axes. The formulae are: 


S (4) =, (7) + X1g3+Xutae, 
S (x2) =S2 (x) + X34 Xu +b, 
S (a3) = S83 (7) + Xgq0+ Xai +ex, 
S (ay)=S8,(2)+Xyp4+Xy+dz, 
S (y)=S1(y) + Yis + Yutay, 
S (yo) =S2(y) + Yo3 + Fat b,, 
S (ys) =S3 (y) + Yeo + Yaitey, 
S(y)=Si(y)+ Vaot Yatdy. 
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The reader who has examined the magnitudes of the proportioned terms given above will see 
how impossible it is to neglect the central row, column or cell *, Their contributions cannot be 
disregarded, and, even if our rough approximations are allowable, they make the work far more 
laborious for grouped tables—the usual condition of material—than directly finding the product- 
moment value. 

We deduce: S (#1) = +946°94050, S (y)= + 905'82650, 

S (a) = — 935°93825, 8 (y») = —956-85100, 
S (2) = +264°57825, —-S (ys) = — 265-98900, 
S (ay) = —274°38050, 8 (ys) = +313-11350, 

These give a position of the true mean at — ‘349 and —-304, i.e. they do not add up to exactly 
zero, because we only used the mean statures to hundredths and not thousandths of an inch. 

We can now determine the values as found by the formulae of the preceding paper. 


First, the correlation coefficient 7. We have: 


r= "5632 (from vii), r='5466 (from vii )), 
r='4863 (from viii a), r=5650 (from viii d), 
r='5549 (from ix @), r='5257 (from ix bd), 
r="5402 (from x), 

r='5549 (from xi), 7='5256 (from xii), 


r=“ Der deutsche Korrelations-Index” = ‘5403 (from xiii), 
r='5233 (from xvii@), 7='5557 (from xviib), 
r=5355='5114=etc. (from xix @), 

v='5531 =°5583=etc. (from xix d). 

Here are 13 values at least of r+, determined from linear moments. Which does the reader 
prefer? The true correlation coefficient as given by the product-moment formula, i.e. 5206 +0147, 
is wnique; it depends on no special form of distribution, and has a perfectly clear geometrical 
meaning. It gives no choice to the user. Here we have a great number of variants-—in the 
bulk significantly higher than the coefficient of correlation. They have no geometrical inter- 
pretation except as approximations to the correlation coefficient, and this, as far as we yet know, 
only for the case of normal frequency. At present no probable errors have been determined, 
and all we can at present say is, that if we assume normal correlation, they will be higher than 
that of the correlation coefficient found by product-moment. The work of deducing one or other 
of them is considerably more laborious and yet less exact than that of the product-moment 
method}, for our proportioning is only rough and we do not know, as in the latter case, what 
(if any) corrections should be made for “grouping.” No wise computer would think of using 
any of these linear moment methods for a long series, in which both variates were measurable 
characters. On the other hand, they may be serviceable as controls on other processes when 
one variate is qualitative, as shown in the example on p. 460. If there be only a few observations 
-——taken to a high degree of accuracy—then these methods are easy of application; but it is 
precisely in such cases that we ought to apply the product-moment method, for it gives us the 
least probable error for the given size of the sample. What is more, it provides in the course 
of the work additional constants, which are very generally needed. 

* If we neglect them we find from (vii a & b) and (viii a & b) *5797, ‘6884, °5254 and °7150 re- 
spectively! This shows how important proportioning is, and better proportioning might possibly 
reduce the results still closer to the true value. 

¢ It is not, perhaps, legitimate to consider (ixa) and (xi) or (ix) and (xii) as giving independent 
values, and even some of the others can hardly be looked upon as more than averaging processes, 

{ I assume that the computer, of course, knows how to apply the product-moment correctly. He 
might spend any amount of time over 1115 observations, if he subtracted the characters of each from 
their means ! 
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It may be said that this additional information would be provided by our Equations (xv), 
(xvi), (xviii) and (xx a & 6), Let us examine this. The unique values of the product-moment 
regression coefficients are 5258 and ‘5104, and the standard deviations are o,=2°7289 and 
o,=2°7018. 

From (xv) 701 /0,= °5598 ="5410. 

From (xvi). ?o2/o, = "4892 = "5708. 


These results are inter se discordant, and their means *5504, 5300 are considerably in excess 
of the true values. The ratio o,/o, has for its true value 10100. From (xviii), we have for its 
values : 

“1 — 1-0325 = 97075 =1°3100 = 63246, 
72 


a very discordant series, the mean of which, “9864, would be interpreted as Sons being less 
variable than Fathers—a result contrary to what has been observed. 

Lastly, before applying (xx a & b), we must adopt some value of r from our discordant returns. 
Let us take the value 5403 provided by the “deutsche Korrelations- Index.” We find from (xx @): 


o, = 2°682 = 2°715. (True value 2°7289.) 
Oy = 2°853 = 2°7038. (True value 2°7018.) 


One value of o, is less than one of o,, and another value of o, is greater than one of o,, and the 
mean values o,=2°698 and o,=2°778 are far from acceptable. As in any particular example 
we do not know which are the better of the alternatives, we must either guess at random or 
take means. It will thus be seen that although other constants, as well as 7, can be determined 
by linear moment methods, these lead to divergent results, and results not in agreement with 
the true values. It is conceivable that with some other system of proportionment, more self- 
accordant results might be obtained, and these more in agreement with direct method values. 
But we doubt this, at least for any case which has not an exact normal distribution. And even 
if such a method of proportioning could be found, we believe the labour of computing would be 
still more considerable. It is hard to believe that even a reasonable national pride will be able 
to justify the use of “der deutsche Korrelations-Index.” As far as I am aware, nobody has 
given the product-moment coefficient of correlation a national name, and T trust they will not. 
National feeling has little to do with true science. 


re 
Further consideration of the Integral | cos "+1 @d@ for large values of n. 
0 


By JOHN WISHART, M.A., B.Sc. 


Tue method of my earlier paper* works very well for the range of 2 illustrated therein ; i.e. 
for x above 100. For lower values of » the approximation is not so good. That, however, in the 
long run, does not matter, as the integral in question, for any value of x below 100, will ulti- 
mately be given by the Tables of the Incomplete Beta-Function now in preparation. But the 
calculation of these Tables for the full range of the arguments is naturally taking some little 
time. In the meantime recourse may be had to another approximate expansion for the integral, 
which has been discovered since the above paper was written. It has been found to give excellent 
results, over the full range of 30 from the mode, for 2 as low as 8. For such low values it is, 
however, shorter to expand by the binomial theorem, or to use one of the well-known expansions 
given in trigonometrical text-books. 


* Biometrika, Vol. xvi. pp. 68—78. 
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Briefly the method is as follows: 


Replace n+1 by WN, and adopt the transformation ¢=tan :. Then 





8 1-#\ dt 
[, cos" d= 2 | Gs) ag WT ieee ek (1), 
¢ 1 ~QN (t24+4084 12194...) 
=2/ —e”™ = es Il dt, 
res: 
Let tm -. Then 
2VN 
2 6 10 
Oe Lf } -9-2-2, ~~ 
[? cos" 0a0= J —¢ 2 96N2 2560N4 ‘ie 
0 JN 1+ eal Ls 
0 4N 


1 (? -}2? ( 2 at ( x8 lo giz 
=——| e 1-— +, -...) (1-— + . a +... be 
Jy | “" *\"'~ ay * Ten? ) i 962 ~ 256001 t 1g43ave* ) de, 
1 e -10 Eg at x fi Ea 1 1 
mes, | — x[i aN * 16N2 say (5 +ay)+ igawe (5+ + on) 
wld giz 
2 
- sen * ease’ |- -(2), 


1 
Ni : 


Expressing in terms of the Incomplete Normal Moment Functions as before, we have 
' N ry) 2. i a 6 /1 
Ig(A =, /*. ris (V+1)} * My (xv) — in ™ (v)+ 16v2 m4 (a4) — 33, m(gts a) mo ig (#) 


105 1s TY nae tg (2) + cite rm 
128¥3 \3 * ay) 8") — am mio) + soag yam (®) [.-(3), 
where J,(1) is the probability integral of the symmetrical frequency curves, as defined in the 
previous paper. 


neglecting terms of higher order than 


Hence 


Ar . 
Ig(N)= Pe ‘rT 4 Oey * x | do z) — ¥ $y (~) + ws 2 (: ‘A bz (")+ wr ps (2) }..(a), 
where hy (#) = mo (x) =h ax, 
y (#) ="25mz (2), 
hy (#@) = "1875, (x) — 15625, (x), 
hs (2) = "234375, (x) — *2734375ms (x), 
hy (@) = 41015625 (x) — ‘984375 myo () +°56396484m,, (a), 


es, 6 2/N 
and v=2NN tan a (1 —cos 6). 


A Table of these functions has been calculated from the existing Table of Moment Functions*, 
and appears on p. 471. eae 
b ted that /%. ¥G*) maf o Co, Where ¢y is the first coeftici 
It may be noted tha 2° Ti(V+l)} ~ nai’ re cy is the first coefficient of 


the earlier expansion, and is tabled from n=100 to n=400. An alternative method of evaluating 
this expression is to use the Tables of the Gamma-Functiont. Or ¢ again we have 


ofa r (34) f = = /[' cos’ 6dé, 
r (} (V+1)} 


the complete integral being tabled from V=0 to V=104 in Biometrika, Vol. x1. p. 377. 


* Tables for Statisticians and Biometricians, Table IX. 
+ Tracts for Computers, No. VIII. 
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*0000000 rf) 


“0000000 | O 

“0000000 4. 2 
*0000002 |+ 9 
0000013 |+ 37 
“0000061 |+ 95 
“0000207 |+ 216 
*0000569 |+ 408 
0001339 |+ 686 
*0002795- | + 1040 
“0005291 |41451 


"0009238 |+ 1871 
0015056 | + 2245 


"0023119 | +2503 
*00336857) + 2580 
‘0046831 | 2498 
*0062405+t}| + 2011 
“0079990 | +1334 
“0098909 |+ 428 
“0118256 |-— 645 
*0136958 |—1798 
*0153862 | -— 2934 
0167832 |—3950 
‘0177852 |—4761 
“O183111 | — 5296 
‘0183074 |—5517 


| 
*O177520 |—5414 
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6 r 
Table of the Functions (x) for computing [ cos’ 6d0, where =2 VN tan 30. 
40 
x |  o(x) ee | ot d(x) | 8 g(x) | 3 
- —— ———+ — —— F-— = 
0-0 | 00000000 0 0 }-0000000| 0 |-0000000 | 0 
0°1 | 08982784 |— 39597 |+1178 | -0000331 |4+1966}-0000000 |+ 16 
0°2 | 07925971 |— 78016 |+2296 } 0002628 |+3813]-0000016 |4+ 84 
0-3 | °11791142 |—114139 |4+3303 | 0008738 | +5436] -0000116 |+ 257 
0-4 | 15542174 |— 146960 |+4151 | 0020284 | +6744] 0000473 l4 554 
0°5 | °19146246 |—175630 | +4804 } -0038574 | +7666] 0001384 |+ 973 
0°6 | *22574688 |— 199496 |+5240 | °0064530 | +8160] -0003268 | +1489 
0-7 | *25803635-|—218121 |4+5449 | 0098646 | +8216] “0006641 | +2046 
0°8 | *28814460 |— 231298 |4+5434 } 0140978 |4+7848]-0012060 | +2577 
0°9 | 31593987 | — 239040 |4+5215-4 0191158 | +7097 [+0020056 | +3002 
1°0 | 34134475-| — 241568 |-+4815"] 0248435*| +6029] “0031054 | +3259 
1*1 | *36433394 +4272 [0311741 | +4721] 0045311 | +3283 
1:2 | 38493033 +3625*] 0379768 | +3256] -0062851 |+3051 
1°3 | -40319952 |— 222536 |+2914 | 0451051 |4+ 1728] 0083442 | +2560 
1-4 | 41924334 |— 209437 | +2183 | 0524062 |+ 218]-0106593 |+ 1834 
1°5 | 43319280 |—194155-|+ 1465-4 -0597291 | — 1201} -0131578 E 931 
1°6 | *44520071 |—177408 |4 793 | -0669319 |—2469] -0157494 82 
1°? | *45543454 |—159869 |+ 191 | 0738878 |—3538]-0183328 |-—1114 
18 | 46406968 |— 142138 |— 325-] 0804899 | — 4386] -0208048 | — 2092 
1:9 | -47128344 |—124733 |— 744 | -0866534 | — 4999 waonicind 2930 
2° | -47724987 |—108072 |—1064 ] -0923170 |—5381 | -0250374 | — 3581 
21 | *48213558 |— 92474 |— 1288 | -0974425-|— 5549] 0266491 |—3998 
2-2 | -48609655*| — 78164 |—1423 ] -1020131 |—5528] -0278610 | - 417: 
2°3 | 48927589 |— 65276 |—1481 | °1060309 |— 5348] 0286556 Petts 
2*4 | -49180246 |— 53870 | — 1475+] *1095139 | —5045 | 0290393 igre: 
25 | 49379033 |— 43939 |-1 418 } °1124924 |—465117-0290393 |-—3395 
+63 — 35426 | — 1324 | 1150058 |—4201] -0286998 | — 2836 
27 — 28237 |—1205-] °1170991 |—3720 0280767 | — 2207 
+8 | 49744487 |— 22253 |- 1072 | 1188204 | — 3237 — 1560 
9 | 49813419 |— 17340 |— 934 | -1202180 | - 2769 2331 | - 943 
30 | 49865010 |— 13362 |— 799 | 1213387 | —2331]-0251390 |— 381 
| v1 | 49903240 |— 10183 |— 671 | °1222263 | — 1930] -0240068 |+ 97 
32 | 49931286 |— 7675+) — 5557] -1229209 | — 1575] 0228843 |+ 479 
3°3 | °49951658 |— 5722 |— 451 | °1234580 |—1267] 0218097 |+ 759 
4 | 49966307 |— 4219 |— 361 | -1238684 |— 1005] 0208110 \+ 947 
25 | 49976737 |— 3078 |— 285-} °1241783 |— 787}-0199070 | +1049 
36 | *49984089 |— 2221 |— 221 | -1244095+|-— 606] -0191079 |+1079 
7 | 49989220 |— 1586 |— 170 J °1245801 |— 461] -0184167 |4+1053 
38 | -49992765+]— 1120 |— 128 | -1247046 |— 348] -0178308 |+ 987 
9) 49995190 |- —s- 788 96 } °1247943 |— 258] -0173436 |4+ 892 
40 | -49996833 |- 541 |- 70 9 -1248582 188 | -0169456 |+ 785 
4'1 | 49997934 |— 370 |— 52 | °1249033 |— 136] -0166261 |+ 672 
4°2 | -49998665*]/— 251 |— 337: | 1249348 |— 99] -0163738 |4+ 562 
473} 49999146 |— 168 |— 26 9 -1249564 68] -0161777 |+ 462 
44) °49999459 |—- 111 |—-—s-:18 J 1249712 |— 48] 0160278 |+ 368 
45 | +49999660 |— 73 |— 139 -°1249812 |- 341-0159147 |4 292 
4°6 | 49999789 | — 47 |}— 8 | -1249878 |— 22]-0158308 |+ 224 
47 | +49999870 | — 30 }— 649 °1249922 |— 16] -0157693 |+ 172 
48 | -49999921 |— 19 |— 4. fF -1249950+|- 9 -0157250-|4+ 127 
4-9 | -49999952 | - 12|— 3] -1249969 |- 7] 0156934 |+ 94 
50 | 49999971 |- 8 {+ 2] 1249981 |- 54-0156712 |+ 69 
® | “50000000 — - 4-1250000 | — | -0156250 




















0166552 | — 5008 
“0150576 hy 4345 
*0130255-| — 3492 
*0106442 | — 2520 
“0080109 |— 1511 
*0052265+|— 528 
“0023893 |+ 363 
“0004116 +1123 
- ‘0031002 |4-1724 
‘0056164 |4+2156 
‘OOT9LTO | +2423 
- 0099753 | +2541 
°0117795*| + 2531 
*0133306 | +2421 
0146396 | +2237 
‘0157249 | +2009 
“0166093 | +1756 
‘O173181 |+1498 
‘O178771 | +1249 
“0183112 |+ 1023 
‘0186430 |}+ 819 
“O1L88929 |+ 645 
‘O1L90783 |+ 497 
‘0192140 |+ 379 
0195312 — 








py (x) 
*O000000 0 
“0000000 0 
“0000000 0 
“0000000 0 
“0000000 |+ 2 
“0000002 j|+ 10 
“0000014 |+ 25 
0000051 |+ 67 
*0000155+/}+ 142 
“0000401 ;+ 269 
“0000916 |+ 455 
“0001886 |+ 694 
0003550 -|+ 973 
*OOOG187 |+1243 
“0010067 |+ 1453 
0015400 |+4+1525 
0022258 }+1399 
*0030515*| + 1012 
‘0039784 |+ 350 
“0049403 |— 574 
“0058448 | — 1687 
“0065806 | — 2885 
‘0070279 | — 4027 
*0070725-| — 4969 
‘0066202 | — 5567 
“0056112 | — 5730 
0040292 5390 
“0019082 4571 
— ‘0006699 | — 3331 
—°0035811 |—1788 
“OO66711 |-— 97 
- ‘0097708 | 4.1589 
— ‘0127116 |+3105 
~ "01538419 |4-4335 
~°O175387 |+ 5188 
— °0192167 |+5622 
— °0203325* | +5643 
— 0208840 | +5304 
— ‘0209051 | +4669 
— 0204593 | +3833 
— ‘0196302 |+ 2890 
— °0185121 |+15935 
- *0172005-| + 1030 
— ‘O157859 |+ 239 
— °0143474 |— 407 
— 0129496 |— 893 
— 0116411 |- 1223 
— °0104549 |— 1409 
— ‘0094096 }— 1473 
— 0085116 |—1448 
— ‘0077584 |— 1350 
— 0051270 —_— 
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A table of numerical examples, designed to show the range of the new expansion, is given 
below. Various methods of checking were employed. Some are the examples of the earlier 
paper. Others, for lower values of V, were checked (a) by expansion in terms of multiple angles, 
or (4) from the manuscript of the Tables of the Incomplete Beta-Function. It would appear that 
seven-figure accuracy is obtainable by a computation which is easier than the previous one, and 
over a considerably larger range of V. The full expansion (4) was used for V=9 and V=21. In 
aby P ' 
the case of V=101 the term in yi Was dropped, while for the last example the term in 
also neglected. The standard deviation (the unit being sin 6) is given. 


1 
3 Was 


N 





N | o sin 0 x Ig (N) Correct result | 
9 *BO15 8 3 *499,1088 *499,1091 
21 "2085 4 1°912,8785 *473,6101 *473,6101 
— — °729325 3°968,9483 *499,9737 *499,9737 
101 “0985 2 | 2°030,4889 *479,1034 *479,1033 
— - 3 | 3°086,0353 *499,0127 *499,0127 
401 | ‘0498 “144 | 2°898,7034 *498,1373 *498,1373 





It may be noted that the mathematician who desires not the probability integral, but merely 


6 
/ cos’ 6d@, can readily compute it from the Table by the formula 


@ 3 2r 1 1 1 1 
[; cos* 6dé = J N \$0 (x) = N fi (x) + N?2 by (x) a N3 3 (#)+ Ni ps (x) ’ 


where  =2,/N tan $6. 


Contributions to the Theory of Small Samples drawn from 
a finite Population*. 


By J. SPLAWA-NEYMAN, Pu.D., Warsaw. 


I. Values of 8, By for the distribution of means of small samples taken from a finite population 
with any given frequency distribution. 

I understand by a small sample from a finite population one in which add the individuals in 
the sample are obtained at a single drawing, i.e. no individual is returned before another is 
drawn. , 

Let p, be the th moment of the frequency curve of a population Q, containing m individuals. 
The mean value of the character of the individuals considered we may suppose to be zero and 
the particular values we shall denote by 


Rig Witiag, Vike Wage ciedauaswadocknoncabevenbecsiersvesverskaeuaaed (1), 
me 
so that Wg Se BGM occsccsceecevsvecdevoveesy iesecenaeees oe.(2). 
i=1 
Let as usual By =o cbawdets paibawencatpcusoouecundters inebecocsdauawe (3); 
Be = S Senne eee e teen eee eee eee e eee eee e seen e esse ee eeees (4). 
Par 


* These results with others were originally published in La Revue Mensuelle de Statistique, publ. 
par lOffice Central de Statistique de la République Polonaise, tom. v1. pp. 1—29, 1923, but as that 
Journal may not be seen by English biometricians and as several important corrections have been 
made they are reproduced here. 








ee 
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We shall consider a sample ; of » individuals taken at random from the population Q. It is 
clear that it is possible to have 
ve m'! 
feo ai adecaveoe Bdsescarenvaueecsaeknes 
different samples @;, and that the ¢ can be equal to 1, 2, ... WV. 
The,values of the individual character considered in the sample @; we shall denote by 
Lats Vig, 000 Vig,  ceccccccccccvcccccesccvccercvececsescescvecvees (6), 
and their mean by w;. Let us consider the population of V numbers «;, where 7=1, 2,... ¥, and 
calculate its four first moments about the mean, which we shall denote by Jf, (4=1, 2, 3, 4). It 
is evident that 





wen 





By =O) osee paonueseeeyayees eeatecenaeeeeseee fuses Pee 
: < ‘ M;? VW 
If we take afterwards the ratios B,= 73" Bs=Fr3 aaeanicuedeenesenteevssueceeererusuwane old), 
we shall be able to form some idea about the distribution of 7; if the distribution of (1) is given, 
We shall begin with the calculation of J/,. Further, we shall denote by #(y) the mean value 
of a variable y. Thus we have 
oe a ] n p n-1 ” 
M,= E (x?)=— E( 2 avyr2+2F wait) 
n° k=1 k=11=k+1 
1 n i n—l n - 
=>3 ( > E (x2) +2 = > £E (ata) eeeeceseccece eee cccccececccccceeeees (9) 
nM \k=1 k=1l=k+1 
It is evident that FE apg) Sat se saccansnses Sanne pea ASS WO pe endaNbabeaneesevans Secbeusen (10), 
m—1 nu nu s 
22 2 ur, = u,* 
’ r=1 s=r-+1 r=1 Bs 
Y (vy) = —— > =F - =, cakes A Te 11) 
E (vata) m(m—1) m(m—1) m1 ( 
So, returning to (9), we have 
I a(rn— 1) m-n 
} } 
M,= 2d fly — | = ry Po eceeceeececeeseees saieane 12). 
“ “al Me m—1 M2 | n (m _- 1) sad ( 
If we let m increase indefinitely, we shall have in the limit the well-known value for the 
second moment of the mean, namely 
laps ee sacitineianiceeanep innings eae (13) 
n 
Now we proceed to the third moment J/;. We have 


a oA l n 2 n-1 u" P ‘ ,w-2 n—1 od 
M,,= E (v7?) = ne E = BYP+3 = = (tar vu + Liz" Liz) +6 = = >> Vip XL iy 
ne k=1 


/ 


k=11=k+1 k=1l=k+1r=l41 
1 u . y n-1 u ? x ‘ n-2 n-l1 n - 5 
= if S E(ay3)4+3 5 3S K(eyrvartxyrr,)+6 5 F S Di Sa @irtin) 4 cass (14). 
nm Lk=1 k=1l=k+1 k=1l=k+1 r=l+1 
But E (43) = ps3 TITTTTIT TTT TTT (15), 


> (u,2 Unt uu. Uy) 
nt] 





’ =ls= 2 
E Vik 24) =- a 5 cis sea sansnmvesetentceesenns (16), 
m (m 1) 
m—2 m-1 m 
6s & ZS UpUy tle 
‘=1 s=r-+1 t=3+ ad 
E (0g, 9p) a seseeseeesseeesneenseneeneeens (17) 


m(m—1) (m —2) 


Evidently we have, using (3), 


m 3 m m—1 m m—2 m—1 m 
(3 ) =SuPF+13 > SF (uPutu,PZu,)+6 5 & S Wp tly =O ..0005+00( 48); 


i=1 i=l r=1 s=r+1 r=1 s=r+4+1 t=s+1 
m au m m1 m . . , c 
and SUF UPF= FS UFP SS (WP gt Us Ur)=O ....rererereoseseseeeecs (19). 
i=1 i=1 i=1 r=1s=r+1 
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m-1 m 
Hence DO Pa PU eH a, oon ves gs ocecncwaswcssecapceecese (20), 
r=1 s=r+1 
m-2 m—1 m 
63 2 Be. Mp Gig Regt BUN oseeescasecpesescce sc¢eccecessecspes (21). 
r=1 s=r-+1 t=s+1 
Putting these values in (16) and (17) and returning to (14), we have 
1 3(n—1) 2(n—-1) (n—2) (m—n) (m— 2n) se 
Ms= ss [ms ~m=1 ™* =) (m—2) |) nF —1)(m—2) "8 oe (22), 


a very symmetrical formula. It is evident that J/; > 5 ifm—>o. 


The calculation of J/, follows in the same way. We have 


n-1 n 
SF (LP Kat) 
k=11l=k+1 

n-1 n-2 n—1 n 

4 > > 2,9 9 r. 2m. a ee +, 24° 2.) 
+6 > 2 @ ik UP + 12 = = (x5x' LyLigt Ly Lip Ligt Vig Vi~Xj1) 
k=1 l=k+1 k=11=k+1 s=1+1 


. es 1 ‘s uu 
M,=E (ai) =, E| 2 aa'+4 E 


m-3 n—-2 n—1 n 
+24 5 > > = ratutuite | PITTITITITITITTT ITT (23). 
k=1 l=k+1 s=/+1 t=s+1 
Hence we have 





E (2x4) = py PTOTTTITITITITITETITITITITILETiLITiTiirirririiiritiirtrrriiriiiiiit eee (24), 
m—-l m 
> FF (uPustu,Pu,) { 
E (#324) =="! _ oe —————- (GRY) nevrersssreceorecesensrcsrssesecess 25 
Fa Via) m(m—1) m(m—1) y) (25), 
m-l m 
2S & (u,*x,%) 
E (ay2a,2)=— 3 ial eeprernnn CREE)! <cwawdcvokedssugenesaceey ere sietceseessens (26) 
(wu aa") m(m—1) m (m—1) (say) hs 
m—2 m—1 m 
23 3 J (UP Ug Uy + Up Ug? UU, Ug Ue?) C 
i Sin Vis ee say) ...(27) 
(win? aia) am (m—1) (m= 2) on (ma —1) (ww —2) AY) ---(87), 
m—-3 m—-2 m—1 m 
#3 2 > ZS (Up Ug UeUy) D 
Pla... New Pe abl teotipatyt "= wae 2 
E (win WinXin tit) m(m—1)(m—2)(m—3) m(m—1)(m—2)(m—3) ee) 
It is evident that 
m 4 
( 5 us) site GAGES COR D0 .ccccsccscesccccsenscnssivics (29), 
i=1 
Oe Oe POINTED  venccecuevcsdvedaccsescessuceseteecectovenecend (30), 
‘=1 i=} 
m 2m 
( = us) BV OM OO: occ acecsncascccueccceeedetisskiiaonecs (31), 
i=1 i=1 
( > ud) =muy+ B= me? iabitmeaeoneninwsendatupiesuceveaebe ected (32). 
i=1 
Hence we have EINES Gtritetas Savchen ociagsoed ee eae Sete (33), 
ME ON, | cvs evinee denies pavieveskiaceecteadseueeocenes (34), 
CNM | hen scak vxeseasers cere -sseavewnccreanen cs (35), 
PN NNR cence ccccedunccccavccctevcoeleatenesaest (36). 
Putting these values into (25), (26), (27) and (28) and returning to (23), we have 
1 n-1 n-1 6 (n—1) (n—-2) 
M,=— -4--— 3- mas — 7 (Quy — mp” 
. wal m— pMt m—1 (mps Hs) + G1) (m—2) a il aa 
(n—1)(n-—2)(n-3) ,, , , 
(m—1) (m—2) (m3) (3m pat — 6s) 
_ m—n ° " 7 = ¢ , / 9 99> 
= 8mm) Gn = BD) ng) Mm” — mn + m+ Gn?) oy + Bm (mm —M— 1) (01) pa?) -eee(B7). 
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We can now calculate B, and B,. We have 
M.? _ (m—1)(m—2n)? 








= eee emer, «5h ce lapbenpeaeostckovece ve cneesesiencens sbisveswcectous 38), 
i= M3 n (m — 2)? ( m— i) Bi ( ) 
B M, _(m—1)[(m? —6mn +m+6n*) Bg +3m (m—xn-1) (n—- 1)] (39) 

=e n(m—2)(m-3)(m—n) nnn 


The formula (39) can be transformed into the following : 


ane% 3. fay) | as a 2|| 4-3 ey Soe (40). 


m+1 m(m— m—2n(m—Nn) m+1 
If m— «,s0 By > By. =™! siaaidDebawe cane onowd on duoa eae debe nicanenecaeereet (41), 
2-3 
he Tes =34"—5 oa iti tee ie mee (42). 
Let Ek ene: eae pce Sagi (43) ; 


then, eliminating 8; and 8, from (41), (42) and (43), we have 
k 
26,,—-3B,.—-6= Do taesttneenaeesesseseesseseeeeeeenetnes (44). 
/ 
We can obtain another interesting result by eliminating ~ from (41) and (42). We have 


Box =3+ Bix sa Pia: hci eae aie (45), 
1 


a formula showing, that if we increase x, the point 1/7 (B,«, Bz.) will approach the normal point 
(0, 3) following the right line from (8, 8,) to the normal point. 





If we take — ene Me eeseannabegeeseususeevannnente (46), 
GB. ~a-3 Pa 
“dn —_ n2 eee eee eee O SESE SP eee eee eee \ 47 Dy 


we see that the rates of decrease of 2, and B,, vary inversely with the square of n. We 
conclude that small values of 2 are sufficient to make these rates approach the normal point. 

We also see from (43) and (44) that if in the classification of Pearson we distinguish only 
three types of curves depending upon the sign of the criterion /, the frequency distribution of the 
mean of the sample approaches the normal one without change of the type. 


Mutatis mutandis these results can be extended to B, and By. 


Certainly we find here some —- We commence by eliminating 


~9 
en from (38) and (40). 
n (m—- n) 
Putting A=8.-3 Ph we have 
7 7 m+1’ 





B,-(3-2 2m (34+A)-—9m—-—A _ (m+1) (m-2) By 
~~ (m--3) m (m+1) m(m—3) By 


for the equation of the straight line which follows the point (B,, B.). 


r ° > zu m—1 . ° 
When ~ increases from 1 to 3 , or to a , this point starts from (8,, 82) and tends to the 


limit I” (0, 9% ees 
(m— 3) m(m+1) 
ing the same way tends to (8;, 82). It is interesting to note that the formulae (38) and (40) do 
not change the values of B, and By if we put m—~x instead of n. We see therefore that the distri- 
bution of means of samples containing n individuals has the same character as that of means 
of samples containing m—n individuals. 


). Afterwards, if x still increases, the point (B,, B,) follow- 
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m®(3+A4)-9m—-—A 





a . We see that the ordinate differs | 

(m —3)m(m-+1) | 

1 s - : : 

from 3 only by a number of the order of _y and thus in most cases the point considered is 
“7 


Now consider the point J/’ (0, 3-2 


practically the normal point. Its ordinate may be smaller or larger than 3 according to the | 
sign of 

E=m?(3+4A)—9m—-A, 
which may be positive or negative. 

But this last case has no importance as € is negative only if 
9+4/814+4(3+A)A 
2 (344) 

As the number of individuals in the general population is usually greater than 10, we can say 

that the ordinate of W/’ is as a rule less than 3. Now, if 

k= 2B. —38; —-6<0, 
the straight line between the two points J/’ and (8;, 82) does not cross the line 

2B, -3B,-6=0, 
and the type of the frequency distribution of means of samples is the same for all values of x. 
If £ > 0, we have one, or two changes of type. The last case happens in the case when there is 
an integer value of n>1, for which 


m< <10. 


2B,—3B,-6=0, 
where B, and B, are regarded as functions of 2 given by (38), (39). 
Taking the differentials from (38) and (40), we have 


dB, m*(m—-1) m—2n 











dn (m — 2)? n? (m — ny? 

Pa PS AP i aes Se ovcesee( 40) 
ab, m*—1 m (m —2n) _m-l 
dn = (m—2)(m-8) n?(m—ny m+1 


We see that their absolute values decrease quickly to zero if 2 tends to 4m, and we conclude 
that small values of 7 are sufficient to cause (B,, B,) to approach the limit-point 1’. 

II. The second moment of the squared standard deviation of a sample, the sample being taken 
ut random from a finite population with a given distribution. 

The second moment required is given by the formula 

v= E( o;') = [EZ (o;)/. 
Using the formulae (10) and (11), it is easy to calculate 
oe fi * ‘ *s n—1 : (n-—1l)m z 
E (o7)=E - Vy — ae c= — [FE (a#y?)-£ (7.0) |=— ay Hg eveeee .»-(50). 
NH k=1 n n\m— 1) 


Afterwards we observe that 


nm-l1 ® Q n-1 n 2 
2 
of=|—y, 2 4-5 > S rr 
t= k= =k+1 


Rn" k=l e” k=l (= 
(n—1)* nu n-1 “ : 
= j > wy'+2 & > ty2a," 
n J k=1 l=h 





n—-1[fr-1 2 n-2 n-1 oR 
> 34 +3 124° . , 0.2 4 y . », 2) 
—4—— | SS (wp tatevP)+ FF ZF (Mp Mag t+ Celi Viet Va Caris’) 
k=1 1=k-H1 1 


k=1 l=k+1 s=/l+4 


4 n-1 u a a n—-2 n-1 ” - " i. 
+—=|  & wy*rxg*+2 2 & SZ (Hie? Vy Lig Vin Vi? Vig + Ln VnVs") 
l=k+1 k=1 1=k+1 s=l41 


n— n—2 n-l1 u 
+6 5 FS SS HM Mig Lit 
k=1 l=k+1 s=I141 t=s+1 
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‘ 9 P 
(n—1)? n n* —2n+3 n-1 ” “ " n-1 n=l n 
2 5 os Ser 8 ook. nal 
| = ae > nt +2 jee Lame > > URE VT 4 ae > = (ex: ad ats ka ) 
| n k=1 n k=1 l=k+1 N° k=1 l=k+1 
’ n—-é n-2 n-1 ” 2 ‘ 
4S (HP NXg AL Lj" Lig t Vp Ljn Vin") 
" 2” k=1 l=k+1 s=1+1 
24 n-3 n—-2 n-l1 n 


+ ri = = = D> Ly VaLigXy- 
W” k=) l=k+1 s=l+1 t=s+t 
Using the formulae (24)—(36) we have 


7 ay - v—2n+3 n—1) 
E (o)=3 - 1)? pat PAG )( 


m—1 
g (% ~ 1) (n— 2) (n—3) |, (n--1)(n-2)(n-3), 4, , 
(Mpro* — 24) 

(m—1) (m—2) (m — 1) (m —2) (mm — 3) “"B2 — 7Pa 
_m(m—n)(ma—m—n—1)(n—1) _ m(n—1)[m? (vn? —2n +3) —m (BnPF+3)4+30724+30]  , 
~ m3 (m—1)(m—2)(m—3) n® (m—1)(m—2) (m—3) - 

wesc esau cb aeyesehesceseeeenes 


(n-1 
(Mpg? — py) +4 ° — My 


(Qyry — mp”) +3 





If we square (50) and subtract it from (51), we have 

= m (m—n) (n—1) 
Co," = 

2 73 (m — 1)? (m — 2) (m— 


a formula giving the squared standard deviation of the squared standard deviations in samples. 





al 2 (mn —m —n— 1) (m—1) Bo— (mn — 3m? +6m—3n- 3)]...(52) 


This result, being a generalisation of formulae given by other authors, is, I believe, novel and 
of considerable importance. It agrees for m indefinitely large with the value given by “Student” 
in Biometrika, Vol. v1. p. 3. For he has 

Bivins n—-1 Big cpp isa i 
un 7 i 
and therefore 
_ (n—1)? 2 (n—1) (n—3) 
i i a a a ns ‘ 
See also Tchouproff, Biometrika, Vol. x11. p. 193, and Church, ibid. Vol. xvu. p. 82. 

Ill. The correlation between the square of the deviation of the mean of a sample from the mean 
of the sampled population with the square of the standard deviation, the sample being taken ai 
random from a finite population with any given distribution. 

Let 2 be the required coefficient of correlation. We shall have 


* (229.2) — E'(x2\ Blo2 
ea E (Le oy ) E \“G ) E (oy ) (52) 
= -— ——MEETIL RCT te Do). 


VIE (wg) —[E (a7) P} {2 (04) -[E (6) 3 





It is evident that 
= _ , 1 nu ‘ P ] n 2H 
vf of =U | — > § Lye — LE | = = > Li >> vy uve 
Nk=1 n” \k=1 k=1 
l n n-1 ” 
. g > . 2.2 134 See 
~ 73 Z P wee 9 Z(H Ly +a Lat Lj, Ly") 
k=1 k=1 l=k+1 


+22 3& = (aaah ratitaat Tatas) —aj* ...(54). 
t =l+1 


Hence, to calculate 2 («,?0;2), we may use the calculated means of #4, ay27;2, vg3ay and 
vy? Ly Xj,, namely (24)—(27) and (37). We shall have 





. P e 1 n—-1 > ‘ ‘ n—1 (w—-1)(v—-2) 
E(#2202)=— va 2. Myr? — pry) — 2 —— pat; — (Quy — ips” |-: 
ero) ne | PAT m1 (mapa — Pa) m—1 "4 (m— 1)( m— 2) y ma maps") sa 


(m—n)(m-2n) (m—n) (m?—6mn+m+6nr? 2] 
Pa 











n?(m—1)(m—2) 8 (am — 1) (m— 2) (m—3) 
soln adel _3 @—") (m—n—1)(n—1) arta 
n?(m—1) (m—2) ” n®(m—1)(m—2)(m—3) |" 


_ aoa) 2 +1) (w—1) , m(m—n) (mn — 3m +3) ( n—1) 
nr }(m—1) (m—2) (m—3) ; ne 3 (m— 1) (m—2) (m—3) " 


~ 
o 
oO 











478 Miscellanea 
—n a 
Now from (12) (a?)=M,= * Ta pm Showeddeesseedes Hiden Saceusbeer tes (56), 


and it is easy to calculate 


1* ied (a—1)i sien 
E(o8)=E (~ = try? — #2) = —— LE (wa*) - B (waa) = ali: Oe 8 


2 k=1 ni n— 1) a | 
' 
using the same formulae (10) and (11). Multiplying (56) and (57) and subtracting from (55), we | 
have 
R m(m—n) (m—2n+1)(n—1) m (m — 7) [3mn* —m (4n +6) +60 +3] ( n—1) ¢ 
to\02> — Be 


3 (m—1)(m—2)(m—3) n3 (m — 1)? (m—2) (m—3) 


(m— -1) a = “le as ae , 
bs Cet a (m—3) 4? * {(m — 22 +1) (m—1) B,—[3m? — m (4n +6) +6243}}...(58), 


where o;, v2 are the standard deviations of ~ and o? respectively. We have further 
o = E (a;') -L EF (#2) P=4,- Me? 


\2 
m—n m—v)° 


=-a7 “net geet -[(m? — Gin +m +6n") py+3m (m—n—-1) (n—1) po*]- pa sok - 5 pe" 


n3 (m—1) (m— 2) (m—3) n2 (m—1)? 





(m— 1) joo” ah 
Se oom Ln” - CD + Mm + Bn*) (M — 1) Bo+-2Zn (mM — xn) (mM? +m—3) 
Saal (m—2) (m=)! , ) Bo+ i ) (mi? +74 ) 
-3in (m—1)*]...... (59). 


Now, putting the values (58), (59) and (52) in the equation (53), we have 
R Vm (n—1) [(m —2n +1) (m— 1) Be — {3m? — 2m (2n+3)+3 (2n+1)}] 


“or J [(m? — 6mn + m+ 6n*) (m—1) ) Bo +2 n(m—n) (m?+m—3)—3m(m— 1) al || 
x [(mn -Gi-i- 1) ( (m— 1) ) B2— (mn? n— 3m? +6m —3n -3)]/ 





As the above formula is very complicated it is desirable to give some approximations. The 
first we reach by assuming that sm is indefinitely large, but 2 finite. Dividing the numerator and 


the denominator of 2 by mz and increasing m indefinitely, we have 


Vn-1 (Bo— 


V/ (8x +2n —3) [(v— 1) By -2 +3] 


But it is also possible to find an approximation for a finite sample from a finite population. 
I followed at first the usual method, expanding 7 in a series in terms of 1/m, and retaining only 
the two first terms, but this method has two disadvantages. First the formula reached is as 
complicated as (60), and secondly it assumes that not only «/m? is negligible, where a is of the 
order of a single digit, but also that all ratios n?/m? can be neglected. It seemed better therefore 
to suppose all ratios 9/m or less and also 8,/m were negligible and retain »/m and its powers. We 
thus reach a very simple approximative formula : 


l B.-3 (1 =) 
R= - m 2n m 69 


7, = “2, as aT | “one ae Sear (02). 
J(1 ~ mM re 2n ) (, —] + 2n ) 





If B.=3, we have 


Rn fe 
V m(m—n) 





Miscellanea 


IV. The correlation between the mean of a sample and its squared standard deviation, the sample 
| being taken at random from «a finite population. 
In the previous notation 7; was the deviation of the mean of the ‘th sample from the mean of 
l i ] 
| the sampled population. Thus the correlation required is given by 
p=E[a;(o2— EB (072) own Me ..000..csceeeseeeees ceveeeeataee ...(64). 
| But as E [v; E (o;*)|= E (a;) E (o7-) =9, 
we have in IM iad Pl xcevsinccvassiviniintaveinanmennses sdantoeke ...(65). 


As oy and J/, have been calculated already, we need only find £[#;0,;7]. Now, 


Si 2 ~ 9 . »F | oo 

=-— = Ly + > 2 Ly PL jp AL ipgr Win es 
n° le=1 k=i l=k+1 J 

and therefore 


w- 


: ghee ae Rea setae tay 
E [0 o7*|= : E (x44) _ E (2 gn° Hy) — M;, 


whence using the formulae (15), (16), (20), and (22), we find easily 


ee \ 
m (m—n) (n—-1) 


E[2;07)]= WV, sped va sarenweceaeetanGiepesneecaneaennns 


n®(m—1)(m—2) ~ 
Putting this result into (65) and using (12) and (52), we have finally 
vim (m —1)(m—3) (n— 1) VB, 


p= = Sy 
V(m— 2){(mn —in —n— 1) (m—1) Bo— (m?n — 3m? +6m — 3n—- 3) 


This result is a generalisation, now first published, of one previously reached. If m be made 
indefinitely large, we have in the limit, 
__ _Na-1VB; 
Pp = - = 
V(a—- 1) Bea—n+3 
and for a large sample *, 
p”=NVB,/VBo— 1 «2.000 en ees nn sess weeees(69). 


We see that, if the frequency distribution of the original population be symmetrical and thus 
8,=0, the coefficient of correlation between the mean and the squared standard deviation is 
always zero. It would be erroneous, however, to assume that they are independent. The only 
known case of independency of the mean and the standard deviation is when the original popula- 
tion is normal and indefinitely large. The above results enable us to prove that if the original 
population be indefinitely large, but not normal, the standard deviation and the mean of a 
sample, and also all their powers, are not independent. This result seems to me important, 
because it shows that the normal curve is the only curve by which, knowing the frequency distri- 
bution y=/ 


(x) of the mean of a sample and the frequency distribution y= (a) of the standard 
deviation, we reach the frequency surface of means and standard deviations simply by multi- 
plying 

F(x, 6)=f (a) xp(o). 


In all other cases such multiplying would be illegitimate. 





* Cf. Biometrika, Vol. tx. p. 7, Equation (xxvi). 
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PRESS NOTICES OF THE FIRST EDITION 


“To the workers in the difficult field of higher statistics such aids are invaluable. Their calculation and 
publication was therefore as inevitable as the steady progress of a method which brings within grip of mathe- 
matical analysis the highly variable data of biological observation. The immediate cause for congratulation is, 
therefore, not that the tables have been dore but that they have been done so well....... The volume is in- 
dispensable te all who are engaged in serious statistical work.” —Science 

‘The whole work is an eloquent testimony to the self-effacing labour of a body of men and women who 
desire to save their fellow scientists from a great deal of irksome arithmetic; and the total time that will be 
saved in the future by the publication of this work is, of course, incalculable....... To the statistician these 
tables will be indispensable.” —Fournal of Education 

“The issue of these tables is a natural outcome of Professor Karl Pearson’s work, and apart from their 
value for those for whose use they have been prepared, their assemblage in one volume marks an interesting 
stage in the progress of scientific method, as indicating the number and importance of the calculations which 
they are designed to facilitate.” —Post Magazine 
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